Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. The development of quality control and modular analysis tools that support reproducible genomics

The development of quality control and modular analysis tools that support reproducible genomics

Access Restricted

Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.

During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.

File(s)
Lang_cornellgrad_0058F_14851.pdf (53.08 MB)
No Access Until
2027-06-18
Permanent Link(s)
https://doi.org/10.7298/eke0-jd90
https://hdl.handle.net/1813/117590
Collections
Cornell Theses and Dissertations
Author
Lang, Olivia
Abstract

Standards for reproducible science build a foundation of trust in the body of literature that informs the next experimental studies researchers conduct. However, the increasing complexity and scale of data and discoveries across scientific fields including genomics has exacerbated the burden of maintaining reproducibility standards. With the advent of next generation sequencing, many innovations in assay development have especially contributed to the complexity and scale within this subfield. This has created an increased demand for tools that perform quality checks and flexible analysis of genomics data that are compatible with a variety of genomic assays. To perform post-sequencing quality controls of genetic backgrounds, we developed the Genotype validation Pipeline (GenoPipe) to detect insertion and deletion modifications along with variant-based strain backgrounds. We identify samples from publicly available data to demonstrate the value of performing this quality control step on genomic data. Subsequent analysis workflows can be built using ScriptManager, a flexible tool that performs modular operations on genomic data to build customized workflows during tracked interactive sessions using an accessible graphical interface. When it comes time to publish, researchers can model their submissions off our recommended file template to intuitively organize their work for readers. This template describes the files structure and incorporates the use of both GenoPipe and ScriptManager into its ordinal execution to enhance the scalability, reproducibility, and flexibility of workflows. To date, the template has been applied and customized for multiple publications to enhance the comparability of future results. Together, these tools support the start to finish analysis of a reproducible research project.

Description
127 pages
Date Issued
2025-05
Keywords
Bioinformatics
•
Chromatin
•
Contamination
•
Genomics
•
Pipelines
•
Software
Committee Chair
Pugh, Benjamin
Committee Member
Lai, William
Mezey, Jason
Danko, Charles
Degree Discipline
Computational Biology
Degree Name
Ph. D., Computational Biology
Degree Level
Doctor of Philosophy
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16938235

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance