Machine Learning For Drug Development: Integrating Genomic, Chemical, And Clinical Data To Identify Drug Targets, Efficacies, Adverse Events, And Combinations
MetadataShow full item record
Despite recent technological advances, drug development has remained a challenging and inefficient process. Machine learning methods have the potential to accelerate this process by using information from past drug successes and failures to decipher the mechanisms and activities of new compounds. This will become even more crucial in the age of “precision medicine” where thorough mechanistic knowledge will be needed to properly position compounds. The purpose of this dissertation is to address this through the development of methods for drug target identification, biomarker identification, indication selection, and adverse event prediction. First we introduce BANDIT to accelerate the process of drug target identification/deconvolution. BANDIT integrates multiple different data types within a Bayesian network to predict the targets for both new and approved small molecules. We found that BANDIT was able to accurately recover a large number of known drug-target interactions, identify new drugs for a common cancer target, and identify DRD2 as the target for ONC201 – a first-in-class molecule in clinical development. Our work on ONC201 led us to ask how we could integrate known information on DRD2 with gene expression profiling and BANDIT to better select analogs and indications for ONC201. We found that we could accurately rank analogs based on measured efficacy, select new cancer types where ONC201 was likely to be efficacious, and identified DRD5 and cancer stem cell genes as biomarkers for ONC201 activity. Following our work on ONC201 and drug target identification, we asked whether these methods could be applied to predict specific adverse events for a specific drug. Building off previous work published by our lab, we developed MAESTER, a data-driven machine learning approach that integrates properties on a compound’s structure and targets, with tissue wide gene expression profiling and known biological networks to calculate the probability of a compound presenting with a set of tissue specific adverse events in the clinic. We found that MAESTER could accurately identify known side effects of approved drugs and could even pinpoint the adverse events of drugs that were approved and later withdrawn for tissue specific toxicities. Altogether this work demonstrates how challenging problems in drug development could be addressed through the integration of diverse datasets. These approaches have the potential to transform the current drug development pipeline by focusing experimental efforts, and identifying new compounds with therapeutic potential, and choosing optimal indication and patient populations – all which could have a direct impact on patient care.
Computational Biology and Medicine
Doctor of Philosophy
Attribution-NonCommercial-NoDerivatives 4.0 International
dissertation or thesis
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International