eduTDM – Text mining without barriers

Staff and students affiliated to a university can access and download all the research papers the institution subscribes to, provided that they are logged in to the institution’s network. While readers can access research literature their university subscribes to quite easily, it is not possible for text and data miners to machine access research literature their university subscribes to effectively and at scale.

The current amendments and exceptions in the Copyright Law have given us the green light to text and data mine (TDM) content we have acquired access to for non-commercial research. eduTDM aims to find a pragmatic solution to arrange how this content can be delivered to text miners as easily as possible based on the subscription they have.

The eduTDM working group has now concluded. It identified the current shortcomings in the support needed for effective and scalable text and data mining of scholarly content and proposed the novel eduTDM service to address those needs. Access the White Paper here .

Background

Recent years have witnessed an unparalleled upsurge in the quantities of digital data. In the world of science, researchers worldwide generate almost a 1.5 million publications on an annual basis, while over 100 million of such scientific articles had been published as of 2015. Another study (2014) , found that there were approximately 28,000 peer-reviewed journals publishing in the English language, generating about 2,5 million articles in English only, while it is estimated that the growth of published articles is increasing by 8%-9% each year.

While, undoubtedly, these vast amounts of new data and information can offer new insights, give rise to new opportunities for analytics and improved understanding, it is equally undoubted that reading and analysing them is beyond human capacities.

Text and data mining (TDM) is emerging as a powerful tool for harnessing the power of and discovering value in data, by analysing structured and unstructured datasets and content at multiple levels and in many different dimensions in order to discover concepts and entities in the world, patterns they may follow and relations they engage in, and on this basis annotate, index, classify and visualise such content.

A study into the Value and Benefits of Text Mining commissioned by Jisc in 2012 concluded that text mining of research outputs offers the potential to provide significant benefits to the economy and the society in the form of increased researcher efficiency, and improving the research process and its evidence base.

Working Group Mission

This working group is consisted of a variety of stakeholders, such as the publishing industry, text and data mining scientific community, digital infrastructures representatives and policy makers. More specifically, this working group aims to:

- Initiate and establish a collaboration and a communication route between a range of stakeholders on this topic.

- Define, discuss and disseminate the principles and views of the stakeholders involved in this collaboration.

- Understand the position of the stakeholders on eduTDM.

- Establish a development plan for putting eduTDM into practice.

eduTDM Team Members

Petr Knoth – Senior Research Fellow in Text and Data Mining, CORE

Nancy Pontika – Open Access Aggregation Officer, CORE

Bikash Gyawali – Research Associate, CORE

Advisory Members

Victor Botev – IRIS.AI Co-Founder and CTO

Rachel Bruce – Director Open Science and Research Lifecycle, Jisc

Duncan Campbell – Senior Director, Global Sales Partnership, Wiley

Jacobo Elosua – IRIS.AI Co-Founder and CFO/COO

Vicky Gardner – Head of Research Services Development, Taylor and Francis

Melissa Harrison – Head of Production Operations, eLife

Kathleen Shearer – Executive Director, COAR

Roland Strauss – CEO Knowledge4Innovation

Victoria Eva – Policy Director, Elsevier