Confounding Factor Correction For Accurate Expression Quantitative Trait Loci Discovery

Other Titles


Expression quantitative trait loci (eQTL) have become an attractive research topic in the past decade assisted by the technical advances in next-generation sequencing (NGS) and high-throughput gene expression measurements. eQTL discoveries provided researchers with new insights into genetic regulatory mechanisms, and are crucial in establishing functional links in genome-wide association study (GWAS) results. A powerful aspect of these studies are that the simultaneous genome wide measurements of gene expression values and sequence variants make it possible to detect associations independent of prior knowledge. However, the high dimensionality of the data also creates multiple challenges in the analysis process. Population structure in genotype data can induce significant inflation in the results leading to false positive findings, and confounding factors in gene expression measurements, such as technical batch effects and environmental differences, can lower the detection power of small genetic effects. The focus of this thesis is on the challenges in analyzing high-dimensional gene expression data to increase the accuracy in eQTL discovery. A central problem in developing confounding factor correction methods for eQTL analysis is to account for non-genetic confounding factors, while preventing broad impact genetic effects of being modeled as non-genetic variation. To address this issue, we developed a novel method CONFETI: CONfounding Factor Estimation Through Independent component analysis. CONFETI is based on a linear mixed model framework and uses independent component analysis (ICA) to estimate statistically independent generative sources from the observed gene expression profiles. Candidate genetic effects are excluded from the correction to maximize the discovery of broad impact eQTL, using the estimated independent components. We evaluated our framework by comparing the performance to other published confounding factor correction methods using both simulated and real human data. In the analysis of simulated data, we show that CONFETI most accurately recovered simulated eQTL results in the presence of confounding factors by distinguishing genetic effects from non-genetic variance.We then analyzed matched twin pair datasets from the Multiple Tissue Human Expression Resource (MuTHER) consortium and datasets consisting of similar tissue pairs from the Genotype-Tissue Expression (GTEx) consortium. To assess the performance of each method in human data, we investigated the replication of cis and trans-eQTL identified in each dataset. We found that accounting for confounding factors greatly increased both the number of identified cis-eQTL in each dataset, and replicating cis-eQTL between twin pairs and similar tissue types. The number of identified trans-eQTL increased as well, however, most of the findings were specific to each dataset and the replication rate remained significantly lower compared to cis-eQTL. While the use of confounding factor correction methods increased the power of the analysis, we found little difference in identifying replicating cis and trans-eQTL in human data by removing candidate genetic effects prior to correction.

Journal / Series

Volume & Issue



Date Issued




confounding factor; eqtl; independent component analysis; linear mixed model


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Physiology, Biophysics & Systems Biology

Degree Name

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Attribution-NonCommercial-NoDerivatives 4.0 International


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record