JavaScript is disabled for your browser. Some features of this site may not work without it.
Joint Inference of Human Genomic Function and Selective Pressure

Author
Gulko, Brad
Abstract
Selective pressure and molecular phenotype provide complimentary perspectives on functional properties of the human genome. In this dissertation, I develop computational methods for identifying collections of molecular phenotypes called functional classes, that are optimally informative about recent selective pressure in humans. Aggregating selective pressure across genomic positions within each functional class produces a score representing the probability that a position evincing a class-associated molecular phenotype is under selective pressure. A class’s score is interpreted as a measure of potential for fitness-influencing genomic function. Functional classes and attendant selective pressure scores are developed over the course of two papers.
In the first paper, I investigate three ENCODE cell-types and develop a non-parametric representation of covariates from four genomic properties including DNase-seq, RNA-seq, chromatin state, and protein coding annotation. The resultant 624 classes and attendants scores, are shown to predict eQTL, transcription factor binding, and enhancers as well as or better than contemporary methods using high dimensional functional covariates, or selective constraint alone. The interpretation of the score as selective pressure is also shown to be consistent with previous measures of genome-wide selective pressure.
In the second paper, I expand the cell-type cohort to 115 Epigenomic Roadmap cell-types and nine genomic properties including splicing, transcription factor binding, and small RNA-seq. Complexity constraints are developed to reduce the number of functional classes from more than 1.2 million possibilities to 61. The resultant functional classes, genomic segmentations, and positional scoring (FitCons2 scores) are used to detect small features including disease associated variation from HGMD and ClinVar clinical databases. FitCons2 scores are shown to have power comparable or superior to contemporary methods designed specifically to detect such features. Functional classes and scores are also shown to identify cell-type specific regulatory behavior of promoters and enhancers, while highlighting regulatory relationships between differing cell-types and developmental stages. I demonstrate how cell-type sensitivity in FitCons2 scores can be used to address an unsolved biological problem in characterizing transcription factor binding in craniofacial enhancers that are believed to differentiate neural development between humans, chimpanzees, and Neanderthals.
Date Issued
2017-08-30Subject
Human Genomics; Artificial intelligence; Computer science; Genetics; machine learning; Statistical Inference
Committee Chair
Siepel, Adam Charles
Committee Member
Wells, Martin Timothy; Joachims, Thorsten; Yu, Haiyuan
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Type
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International