You can review OIT's licensed software on their Software Distribution page. Below are some selected software mining tools.
The HathiTrust Research Center (HTRC) enables computational analysis of the HathiTrust corpus. It is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with HathiTrust, to help meet the technical challenges researchers face when dealing with massive amounts of digital text. It develops cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.
As a member of HathiTrust, University of Tennessee, Knoxville affiliates are able to create accounts to use the HTRC Analytics site. Researchers from member organizations have full access to the site and tools, and in fact have the benefit of being able to create an HTRC Data Capsule in which they are able to analyze datasets drawn from the full HathiTrust corpus. Those within the membership community are also eligible to apply for special support for their research via the HTRC's Advanced Collaborative Support program.
For more information, you can:
Read a brief overview of the HTRC's Collections and Tools
Find tutorials and detailed documentation in the HTRC Documentation
Review the code that makes it all run on the HTRC GitHub
See more documentation on Getting Started and Help!
The HTRC offers a suite of tools for computational text analysis. These tools cover a wide variety of functions ranging from simple statistical analysis of words to complex algorithms relating concepts and meaning.
HTRC Analytics is the primary site for interacting with HTRC. It provides access to HTRC worksets and off-the-shelf algorithms to analyze them. It also contains a dashboard where researchers can create a secure computing environment, called a Data Capsule (see below). Several of the HTRC algorithms are based off the Software Environment for the Advancement of Scholarly Research (SEASR, pronounced “Caesar”), a legacy project developed with funding by the Andrew W. Mellon Foundation.
The HathiTrust+Bookworm visualization tool allows researchers to graph word trends across the HathiTrust corpus and facet their search by bibliographic metadata.
The HTRC Data Capsules secure compute environment allows researchers to create a virtual machine desktop “capsule” that can be used to run customized research methods and tools not supported by the pre-built algorithms. Researchers control their research process while in a capsule, and only derived data may be released when they are finished
HTRC Tool |
Technical skills |
Rights status |
Methods |
Data format |
Web algorithms |
Low |
Public domain |
Off-the-shelf |
Can’t see underlying data |
HT+Bookworm tool |
Low |
All (13.7 million volumes) |
Visualize trends |
Can’t see underlying data |
Data Capsule environment |
Medium to high |
Public domain |
Your choice, including Voyant |
Raw OCR |
Extracted Features dataset |
Medium to high |
All (15.7 million volumes) |
Any requiring bag-of-words |
Words and word counts in structured file |