eCommons

 

Exploring the Human Gut Microbiome: Statistical Methods, Computation, and Applications in Metagenomics

Other Titles

Abstract

The totality of microbial species and their associated genomes living within the human gastrointestinal tract are known collectively as the human gut microbiome. The human gut microbiome is an integral part of human health. There is some evidence that human genomic variation is associated with differences in the composition of the gut microbiome, leading to potential health effects. For example, mutations in NOD2, a gene associated with Crohn’s disease, and mutations in MEFV, a gene causing Mediterranean fever, are associated with compositional shifts in certain bacterial phyla. By jointly analyzing the genomes and the metagenomes of individuals in a population, we can uncover the connection between the two, and how they relate to health outcomes using health or phenotype data. To investigate these questions, I used the shotgun metagenomic sequencing data, along with genotype and phenotype information, for 250 adult female twins from TwinsUK. To understand the link between the gut microbiome’s composition and functions with human health outcomes, I apply classical statistical and machine learning methods to identify features of the gut microbiome that can predict host diseases and phenotypes. I find interesting results for anxiety symptoms within twin pairs who are discordant for anxiety. Specifically, 175 genes were found to be enriched in the twins without anxiety and absent in those with anxiety. Using strain-level metagenomic analyses, I identify the source of these genes as a species within the genus Azospirillum. Studies of the impact of host genetics on the gut microbiome composition have mainly focused on the impact of individual host variants, without considering their collective impact or the specific functions of the gut microbiome. To assess the aggregate role of human genetics on the gut microbiome composition and function, I apply both the Tweedie distribution, for modeling gene and species abundances in metagenomic data, and the multivariate data integration method known as sparse canonical correlation analysis to the challenge of identifying correlations between overall host genetics and the composition of the gut microbiome or its composite functions.

Journal / Series

Volume & Issue

Description

162 pages

Sponsorship

Date Issued

2021-12

Publisher

Keywords

Biostatistics; Computational Biology; Genomics; Metagenomics; Microbiome; Tweedie

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Brito, Ilana Lauren

Committee Co-Chair

Committee Member

Messer, Philipp
Clark, Andrew

Degree Discipline

Genetics, Genomics and Development

Degree Name

Ph. D., Genetics, Genomics and Development

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Attribution-NonCommercial-ShareAlike 4.0 International

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record