Skip to main content

Text and Data Mining: Tools

Tools

Licensed Software Through OIT

Click here to download licensed software made available through UT's Office of Information Technology, including:

 

 

QDA Miner is a software package which aids in the qualitative analysis of text or images. Download under Provalis Research in the OIT software list. At UTK, you can schedule a one-on-one tutorial by calling the OIT HelpDesk at 865-974-9900. Tutorials are also available on QDA Miner’s tutorial website.

 

NVivo is a qualitative analysis software package that supports both qualitative and mixed methods research. For UTK, OIT offers workshops on NVivo each semester, and you can schedule a one-on-one tutorial any time by calling the OIT HelpDesk at 865-974-9900. You can also visit NVivo’s Support page or watch tutorials from QSR International’s YouTube channel.

 

MATLAB and Simulink are computational software environments used to perform a variety of computational tasks such as in engineering, science, mathematics, statistics and finance. Includes optional Text Analytics Toolbox for textual data. The MATLAB Onramp course is available at no additional charge to registered users of the UT MATLAB site license. If you are not a registered user, log into the OIT Software Download Site, download the file under MathWorks, Inc., and follow the steps for creating a MathWorks account. Log in to take the 2-hour MATLAB Onramp course

 

 

Other Free Software

 

OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

 

https://www.r-project.org/

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

Good Packages to Know about:

There are numerous tutorials and guides for using these packages that can be found around the web.

   

https://www.python.org/

Python is a programming language that lets you work more quickly and integrate your systems more effectively. See their FAQs for more information.

   
Gale Digital Scholar Lab
Digital Scholar Lab is an online tool for collecting data sets comprised of digital humanities content from our UT Knoxville Gale Primary Sources subscriptions. Those data sets can then be analyzed using text analysis and visualization tools built into the Digital Scholar Lab. Digital humanities analysis methods include: Named Entity Recognition, Topic Modelling, Parts of Speech, and more.

The Library will have access from February 18, 2019-February 18, 2021.

First time users: Click the Create an Account button and use the Microsoft login. Returning Users: Click Log In and use your Microsoft credentials.

 

:

  • 17th and 18th Century Burney Collection
  • 17th and 18th Century Nichols Newspapers Collection
  • 19th Century UK Periodicals
  • American Fiction
  • Archives Unbound
  • Archives of Sexuality & Gender
  • Associated Press Collections Online
  • Brazilian and Portuguese History and Culture
  • British Library Newspapers
  • China and the Modern World
  • Crime, Punishment, and Popular Culture 1790-1920
  • Daily Mail Historical Archive, 1896-2004
  • The Economist Historical Archive
  • Eighteenth Century Collections Online
  • The Illustrated London News Historical Archive, 1842-2003
  • The Independent Digital Archive
  • Indigenous Peoples: North America
  • Liberty Magazine Historical Archive, 1924-1950
  • The Listener Historical Archive, 1929-1991
  • The Making of the Modern World
  • Nineteenth Century Collections Online
  • Nineteenth Century U.S. Newspapers
  • Picture Post Historical Archive
  • Punch Historical Archive, 1841-1992
  • Sabin Americana, 1500-1926
  • Smithsonian Collections Online
  • The Sunday Times Digital Archive
  • The Telegraph Historical Archive
  • The Times Digital Archive
  • The Times Literary Supplement Historical Archive
  • U.S. Declassified Documents Online
  • U.S. Supreme Court Records and Briefs, 1832-1978

HathiTrust Research Center

The HathiTrust Research Center (HTRC) enables computational analysis of the HathiTrust corpus. It is a collaborative research center launched jointly by Indiana University and the University of Illinois, along with HathiTrust, to help meet the technical challenges researchers face when dealing with massive amounts of digital text. It develops cutting-edge software tools and cyberinfrastructure to enable advanced computational access to the growing digital record of human knowledge.

As a member of HathiTrust, University of Tennessee, Knoxville affiliates are able to create accounts to use the HTRC Analytics site. Researchers from member organizations have full access to the site and tools, and in fact have the benefit of being able to create an HTRC Data Capsule in which they are able to analyze datasets drawn from the full HathiTrust corpus. Those within the membership community are also eligible to apply for special support for their research via the HTRC's Advanced Collaborative Support program.

For more information, you can:

 

Tools and Services

The HTRC offers a suite of tools for computational text analysis. These tools cover a wide variety of functions ranging from simple statistical analysis of words to complex algorithms relating concepts and meaning.

HTRC Analytics

HTRC Analytics is the primary site for interacting with HTRC. It provides access to HTRC worksets and off-the-shelf algorithms to analyze them. It also contains a dashboard where researchers can create a secure computing environment, called a Data Capsule (see below). Several of the HTRC algorithms are based off the Software Environment for the Advancement of Scholarly Research (SEASR, pronounced “Caesar”), a legacy project developed with funding by the Andrew W. Mellon Foundation.

HathiTrust+Bookworm

The HathiTrust+Bookworm visualization tool allows researchers to graph word trends across the HathiTrust corpus and facet their search by bibliographic metadata.

Data Capsules

The HTRC Data Capsules secure compute environment allows researchers to create a virtual machine desktop “capsule” that can be used to run customized research methods and tools not supported by the pre-built algorithms. Researchers control their research process while in a capsule, and only derived data may be released when they are finished

 

The HTRC supports many methods and technical abilities:

HTRC Tool Technical skills Rights status Methods Data format
Web algorithms Low Public domain Off-the-shelf Can’t see underlying data
HT+Bookworm tool Low All (13.7 million volumes) Visualize trends Can’t see underlying data
Data Capsule environment Medium to high Public domain Your choice, including Voyant Raw OCR
Extracted Features dataset Medium to high All (15.7 million volumes) Any requiring bag-of-words Words and word counts in structured file

Loading ...

Librarian Contact for This Guide

Monica Ihli's picture
Monica Ihli
Contact:
865-974-2876

Licensed Content Questions

Lizzie Gallagher's picture
Lizzie Gallagher

For questions about licensed content such as requesting content extract from one of our licensed content providers, please contact your Electronic Resources Assistant Librarian.