What is Text and Data Mining?
Text and Data Mining is the automatic analysis and extraction of information from large numbers of documents or data sets, and is particularly valuable in cases of unstructured data. Information in this guide intersects with concepts from basic programming languages, machine learning, and statistical computing, and is often discussed in the context of data science. Scholars from across disciplines employ mining techniques including the humanities, social sciences, and physical sciences. Please use this guide to find information on licensed content and other datasets, essential tools, training, and helpful resources.
Libraries and Mining:
This study conducted by researchers for the Eigenfactor at the University of Washington looks at the differences in authorship by gender within disciplines. The study was conducted using the JSTOR corpus.
This joint project by Lindsay King and Peter Leonard at Yale University shows data mining using the Vogue Archives from ProQuest. * Our license agreement with ProQuest does not allow for TDM. Please contact your librarian for more information.
Scientists at the Computational Story Laboratory have analyzed novels to identify the building blocks of all stories.
Researchers at Carnegie Mellon have used data mining to find relationships between prominent people of early modern England using an amazing visual representation.