Skip to main content

Text and Data Mining: Home


What is Text and Data Mining?

Text and Data Mining is the automatic analysis and extraction of information from large numbers of documents or data sets, and is particularly valuable in cases of unstructured data. Information in this guide intersects with concepts from basic programming languages, machine learning, and statistical computing, and is often discussed in the context of data science. Scholars from across disciplines employ mining techniques including the humanities, social sciences, and physical sciences. Please use this guide to find information on licensed content and other datasets, essential tools, training, and helpful resources. 


Helpful Links:

TDM Basics

Glossary of TDM Terms

Libraries and Mining:

Association of Research Libraries Issue Brief on TDM


Gender composition of scholarly publications (1665 - 2011)

This study conducted by researchers for the Eigenfactor at the University of Washington looks at the differences in authorship by gender within disciplines. The study was conducted using the JSTOR corpus. 


Robots Reading Vogue*

This joint project by Lindsay King and Peter Leonard at Yale University shows data mining using the Vogue Archives from ProQuest. * Our license agreement with ProQuest does not allow for TDM. Please contact your librarian for more information. 


Data Mining Reveals the Six Basic Emotional Arcs of Storytelling

Scientists at the Computational Story Laboratory have analyzed novels to identify the building blocks of all stories.


Six Degrees of Francis Bacon

Researchers at Carnegie Mellon have used data mining to find relationships between prominent people of early modern England using an amazing visual representation.


Monica Ihli's picture
Monica Ihli

Licensed Content Questions

Lizzie Gallagher's picture
Lizzie Gallagher

For questions about licensed content such as requesting content extract from one of our licensed content providers, please contact your Electronic Resources Assistant Librarian.