CC Science → TDM

Potential of TDM: TDM is increasingly becoming an important scientific technique for analyzing large corpora of articles. The technique is used to uncover both existing and new insights in unstructured data sets that typically are obtained programmatically from many different sources. A few of the innovative examples include GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles1; improving human curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database2; and discovering a new link between genes and osteoporosis3.

  1. Zhang, C., V. Govindaraju, J. Borchardt, T. Foltz, C. Ré, and S. Peters. 2013. GeoDeepDive: Statistical inference using familiar data-processing languages. SIGMOD ’13, New York, New York.
  2. Thomas C Wiegers, Allan Peter Davis, K Bretonnel Cohen, Lynette Hirschman and Carolyn J Mattingly. Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD). BMC Bioinformatics 2009, 10:326doi:10.1186/1471-2105-10-326
  3. Varun K. Gajendran, Jia-Ren Lin, David P. Fyhrie, An application of bioinformatics and text mining to the discovery of novel genes related to bone biology, Bone, Volume 40, Issue 5, May 2007, Pages 1378-1388, ISSN 8756-3282, DOI: 10.1016/j.bone.2006.12.067

Legal Uncertainty: While the science and technology of TDM are complex enough involving information retrieval (IR), optical character recognition (OCR), and natural language processing (NLP), the legal complications are, sadly, equally dizzying. Not only is the legal status of TDM unclear at best, it varies from jurisdiction to jurisdiction making cross national collaboration difficult. Besides the license status of the original material, contractual agreements between research institutions and publishers, who are often the gatekeepers of the corpora, can create significant hurdles.

Public Sentiment: In a recent comment on proposed UK exception for information mining, both iCommons and the Open Knowledge Foundation supported the UK Government’s opinion that it is inappropriate for “Certain activities of public benefit such as medical research obtained through text mining to be in effect subject to veto by the owners of copyrights in the reports of such research, where access to the reports was obtained lawfully.” PLOS opined, “Enabling content mining is a core part of the value offering for Open Access publication services.” In its response to EU copyright review, LIBER stated, “All exceptions related to education, learning and access to knowledge to be made mandatory. In particular, we would like to see a specific exception for text and data mining for all research purposes.”

Knowledge is Power: While the above sentiments are laudable, we believe the more knowledgeable TDM’s potential users are about the technology and the issues, the better they will be able to negotiate conditions that make their research easy and efficient.

Workshops: We are developing a training program in TDM in the form of a workshop introducing the legal considerations through hands-on exercises. We will introduce the topic, the tools and techniques, tackle a specific problem, and then use that to expose researchers to the legal complications that they may encounter in conducting their research and the legal considerations they should keep in mind when choosing a license for their works.

To be clear, we are not intending the workshop to be a detailed and comprehensive training in TDM, and it is certainly not a replacement for expertise in this deep and comprehensive technique. Instead, the workshop is designed to be both an introduction to basic technical and legal concepts as well as an opportunity to get to network with experts as well as novices with interest in the field. We hope participants intending to use TDM for their work will be better informed when seeking collaboration with TDM experts.