Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Statistical Significance For Dna Motif Discovery

Statistical Significance For Dna Motif Discovery

Permanent Link(s)
https://hdl.handle.net/1813/29452
Collections
Cornell Theses and Dissertations
Author
Ng, Patrick
Abstract

The identification of transcription factor binding sites, and of cis -regulatory elements in general, is an important step in understanding the regulation of gene expression. To address this need, many motif-finding tools have been described that can find short sequence motifs given only an input set of sequences. In this dissertation, we will begin by discussing why a reliable significance evaluation should be considered an essential component of any motif finder. We will introduce a biologically realistic method to estimate the reported motif's statistical significance based on a novel 3-Gamma approximation scheme. Furthermore, we show how the reliability of the significance evaluation can be further improved by incorporating local base composition information to its null model. We then demonstrate its reliability by applying GIMSAN/MOTISAN - de novo motif finding tool that incorporates this novel significance evaluation technique - to a well-studied set of Saccharomyces cerevisiae motif input data. Our results also reveal that an ensemble method based on our significance evaluation can substantially improve the actual motif finding task. Finally we will present ALICO (Alignment Constrained) null set generator: a framework to generate randomized versions of an input multiple sequence alignment that preserve some of its crucial features including its dependence structure. In particular, we will show that, on average, ALICO samples approximately preserve the PIDs (percent identities) between every pair of input sequences as well as the average Markov model composition. We will demonstrate its utility in phylo- genetic motif finders - motif finding tools that leverage conservation information - in terms of both reliability of statistical significance and improvement of motif finding task through ensemble method.

Date Issued
2011-05-31
Keywords
motif discovery
•
sequence analysis
•
computational biology
Committee Chair
Keich, Uri
Committee Member
Booth, James
Friedman, Eric J.
Degree Discipline
Computer Science
Degree Name
Ph. D., Computer Science
Degree Level
Doctor of Philosophy
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance