Precision Medicine In The Age Of Big Data: Leveraging Machine Learning And Genomics For Drug Discovery

Other Titles


Targeted therapies designed to specifically target molecules involved in carcinogenesis have achieved remarkable antitumor efficacy. However resistance inevitably develops and many cancer patients are not candidates for these targeted therapies. Furthermore the clinical attrition rate continues to rise, which remains a barrier in the development of novel targeted therapies. Integration of extensive genomics datasets with large drug databases allows us to begin to tackle questions about target discovery and drug toxicity with the ultimate goal of accelerating personalized anticancer drug discovery. The purpose of this dissertation was to address these problems through the development of drug repurposing, toxicity prediction, and drug synergy prediction models. First to target the role of transcription factors as drivers of oncogenic activity, we developed a computational drug repositioning approach (CRAFTT) that makes predictions about drugs that specifically disrupt transcription factor activity. To do this, CRAFTT integrates transcription factor binding site information with drug-induced expression profiling. We found that CRAFTT was able to recover a significant number of known drug-transcription factor interactions and identified a novel interaction that we subsequently validated. Our work in drug discovery led us to ask questions about what makes a drug safe. We developed a data-driven approach (PrOCTOR) that integrates the properties of a compound’s targets and its structure to directly predict the likelihood of toxicity in clinical trials and was able to accurately classify known safe and toxic drugs. Finally to address the problem of drug resistance, we developed a machine learning approach to identify synergistic and effective drug combinations based on single drug efficacy information and limited drug combination testing. When applied to mutant BRAF melanoma, this approach exhibited significant predictive power upon evaluation with cross-validation and further experimental testing of previously untested drug combinations in cell lines independent of the training set. Altogether this work demonstrates how the integration of orthogonal datasets gives us power to address difficult questions that are critical for precision medicine and drug discovery. Approaches such as these have the potential to make a direct impact on how patients are treated, as well as to help prioritize and guide additional focused studies.

Journal / Series

Volume & Issue



Date Issued




Genomics; Machine Learning


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Computational Biology and Medicine

Degree Name

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Attribution-NonCommercial-NoDerivatives 4.0 International


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record