Utilizing data is an essential function for students and researchers alike. Whether you’re just starting your data journey or are a seasoned analyst, the library offers services and resources to help you understand, clean, and analyze your own data and secondary data.
Various tools, both proprietary and open source, are available to you for data cleaning and analysis. While learning these tools and the skills needed to use them can be challenging, various avenues of support exist for new and existing learners including previously recorded webinars and upcoming workshops! Topics covered by these materials include coding in R, Python, and SQL.
In addition to tools and resources provided by the UT Libraries, OIT also provides access to many great tools that can be used for cleaning and analysis. A list of current tools is available here: https://oit.utk.edu/software-hardware/software/. Software is available both for local download to your personal computer and online using Citrix.
OIT also provides instructor-led workshops for some of the tools available as well as self-paced options:
R is a free collection of programs, libraries, and utilities used by programmers for data visualization and statistical analysis. Resources for downloading and using R can be found below.
MATLABis a licensed software environment that can be used by programmers to process and visualize data. MATLAB Online is accessible to students through the University of Tennessee, using their UTK email address.
SPSS is a paid software offering advanced statistical analysis, text analysis, integration with big data, and seamless deployment into applications. SPSS Statistics is used for extracting and analyzing datasets. SPSS Modeler is used for data preparation and discovery, and predictive analytics.
SAS is a software company offering a variety of tools and resources for data analytics. SAS Viya is a paid AI and analytics platform designed to manage data, develop models, and help users make grounded decisions. SAS Skill Builder for Students is a hub where users can access free software, take E-Learning courses and certification pathways, and watch technical tutorials for SAS programs and data analytics/visualizations.
Looking for more intensive and data analysis support?
OIT’s Research Computing Support (RCS) can help you with designing a research project and analyzing the results. Students are eligible for 15 hours of help each semester, while faculty can receive up to 50 hours each year. To get started with RCS, please reach out to the OIT Helpdesk.
Being a good steward of your data doesn’t start with collection, and it doesn't end there either! To get the most out of your research data, you can follow a few best practices to maximize the total benefit your data can provide.
When designing your research study it's important to consider and create a data management plan. Plans such as these are sometimes required for grant-funded research and can help you with keeping your data organized if completed early.
Additional steps you can take to keep your data well maintained include using consistent naming conventions for your files, keeping your data in non-proprietary formats such as csv., xml., wav., tiff., and txt., and keeping multiple copies of your data in separate storage areas such as a local computer, hard drive, and cloud storage as security permits. It is also recommended that you keep updated metadata, which is the term used to describe information collected about the data such as what instruments were used and their calibrations, when the data was collected, and who collected it, records for data collected as well. What metadata you should include may depend on where you’re hoping to deposit your data as well as the subject material of the data. The Digital Curation Centre maintains a list of popular metadata standards here: https://www.dcc.ac.uk/guidance/standards/metadata/list.
Data ethics is a broad and complicated subject that must be considered in different ways during every step of the research and data lifecycles. Beyond ensuring the accuracy and honest disclosure of data collected, data should also be anonymized before sharing, sent and stored securely, and respectful of privacy expectations where applicable.
It’s also important to be mindful of potential biases that may lead to harmful interpretations of data. Considerations should be made regarding the inclusion or exclusion of groups that may be seen as representative of a dataset. Collected data should be inclusive of the groups it claims to represent and should be created and interpreted with input from those groups. Where applicable, it’s also important to consider potential harms that could be made possible by the existence of the data at all. Finally, data should be made openly available to the groups it could impact. For additional information about the ethical implications of data, researchers can explore the following resources:
Additionally, the ethical use of your own data and secondary data often start with citations. An explanation of how to get started with citing data can be found here: https://data.research.cornell.edu/data-management/storing-and-managing/data-citation/.