Skip to Main Content

Text and Data Mining

About Data Sources

UT Libraries current students, staff, and faculty can access licensed data and datasets. Current users can also text and data mine licensed content with selected content providers. Freely available data sources and APIs are also available on the web. Please explore these sources to see what fits your needs. Contact a Librarian for additional help!

Data Sources

University of Tennessee, Knoxville Libraries Licensed Content

The resources listed on this page may be text and data mined for academic scholarship or educational purposes. The list is organized by vendor/platform based on our UT license agreements with the vendor or publisher.

If you do not see a resource listed here, please contact us and we can investigate further. We will need time to review the license agreement and terms of use, so please plan accordingly. Carrying out automated text and data mining on a database that violates its terms of use is a violation of the University Libraries Electronic Resources Use Policy.

 

TDM Permitted Content


Adam Matthew

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with Adam Matthew. Adam Matthew requires a permission form be filled out and submitted before mining begins.

TDM Statement

Example Permission Form

Contact: info@amdigital.co.uk


Cambridge University Press

Permission provided for non-commercial educational and scholarly TDM from Cambridge University Press's Terms of Use.

Terms of Use

Rights and Permissions

Contact: directcs@cambridge.org


Clarivate Analytics

Web of Science's production team can create a custom data set based on set variables for a fee (contact librarian for additional help). Within Web of Science, you can use the Analyze tool to analyze a subset of data within the interface.

Clarivate "API Expanded"

 

 


Elsevier

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with Elsevier and Elsevier's TDM Policy. Elsevier has a Developers Portal where you register to use their API Key. After registration you must request elevated privileges to receive full access to their data. There is a 20,000 records per week rate limit.

TDM Policy

Developers Portal

TDM Registration Forms


Emerald

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with Emerald Publishing. Emerald asks that you notify them before conducting any TDM activities on www.emerald.com/insight to allow Emerald to manage server capacity. This will allow them to enable you to complete your activity without technical obstacles and to maintain access for all Emerald users.

TDM License

Contact: permissions@emeraldinsight.com


Gale

Permission provided for non-commercial educational and scholarly TDM from our license agreement with Gale.


JSTOR

TDM of JSTOR content is permissible according to our License Agreement with some restrictions. Data for Research (https://www.jstor.org/dfr/) is JSTOR's TDM service. Datasets must be requested through JSTOR and are processed by JSTOR. Datasets are free and may include data for up to 25,000 documents. See their site for more info on creating datasets, specifications, requests, and sample datasets.

Dataset Services

Sample Datasets

Dataset Request Form

JSTOR Data for Research

Technical Specifications


Oxford University Press

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with Oxford University Press.

Rights and Permissions

Contact: Data.Mining@oup.com


Project Muse

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with Project Muse. Project Muse requests that you contact them before beginning mining.

FAQs


ProQuest

Text and data mining of our subscribed ProQuest content is available through ProQuest TDM Studio. Simply create an account and begin building datasets.

ProQuest TDM Studio


SAGE

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with SAGE . See TDM Info for request limits and other API information.

Terms of Use

TDM Info

TDM License


Springer Nature

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with SpringerNature. However, there are restrictions on storage of data. Springer Nature has a TDM Attachment to their License Agreement. Contact your librarian for more information.


Taylor and Francis

Experience with Taylor and Francis shows that they are willing to accept TDM of their products when informed of the research involved and time frame. Contact your librarian for help.

Terms and Conditions


University of Chicago Press

Permission provided for non-commercial educational and scholarly TDM from University of Chicago Press's Terms and Conditions. Specifically requests for user to contact them for approval.

Terms and Conditions


Wiley

Permission provided for non-commercial educational and scholarly TDM from our License Agreement with Wiley.

TDM Policy

Contact: TDM@wiley.com


 

TDM Not Permitted


EBSCO

TDM of EBSCO content is not permissible at this time.


Newsbank

TDM of NewsBank content is strictly prohibited in our License Agreement. Mining of NewsBank content will have an additional cost attached to it and may require additional licensing. Please contact your librarian for more information.

Twitter dot come blue bird logo
Twitter has an API (Application Programming Interface) which provides access to Twitter data in machine readable format. The free version of API access is called the Standard tier. You will need to register a Developer Account in order to gain access to the API.

 

The main idea of the API is that you construct HTTP requests using the parameters described in the search endpoints documentation, and get back your results in JSON format. Some people may choose to just put together their own scripts using the appropriate tools for their language, such as Python requests and passing the response text to be loaded using standard library JSON processing tools. Another option is to look around the community for more specialized tools such as the twitteR package for R. You can also review data dictionaries for tweets, users, and entities (contextual information such as mentions and hashtags).
Hathi Trust elephant logo
HathiTrust makes the texts of public domain works in its corpus available fro research purpose. The works fall into two categories: non-Google digitized volumes, which are freely available, and Google-digitized volumes, which are available through an agreement with Google.

American Physical Society dot com logo
American Physical Society offers APS Data Sets for Research- The corpus of Physical Review Letters, Physical Review, and Reviews of Modern Physics is comprised of over 450,000 articles and dates back to 1893. Researchers may now request access to this data by filling out a simple web form. The requesting researcher must accept the terms and conditions governing the use of the data sets. Requests will be quickly reviewed and, if approved, the data will be made available for download after accepting the terms and conditions. Contact data-requests@aps.org with any questions.

Kaggle dot com logo
Kaggle provides free access to datasets and data science training courses for a variety of languages and technologies. Kaggle offers a no-setup, customizable, Jupyter Notebooks environment. Access free GPUs and a huge repository of community published data & code. You can also register with an account to keep your work.