Show simple item record

dc.contributor.advisorRätsch, Gunnar
dc.contributor.authorHyland, Stephanie
dc.description.abstractThe way we do medicine is undergoing a revolution driven by technology. As the modern drive to record, share, and analyse data sweeps across society, healthcare lies squarely in its path. Data generated by every-day clinical practice presents an invaluable view of health and disease at a scale previously unimaginable. However, to benefit it, we need computational tools to extract meaning, clinical insight, and actionable predictions. This new digital era of medicine is an opportunity not only for healthcare providers, but also for machine learning researchers to develop new methods tailored to the unique demands of this complex domain. The work described here sits in this sphere.Firstly, we explore representation learning for medical language. With its long-tailed distribution of technical terms, medical language necessitates development of methods to augment data-scarcity by exploiting prior information encoded in knowledge graphs. Obtaining semantically meaningful representations of medical concepts and their relationships is vital, and we describe a probabilistic model to learn such representations.Secondly, we address learning from and implicitly representing long time series using recurrent neural networks. These long sequences are commonplace in medicine, where one's health history is necessarily lengthy, but early events nonetheless provide crucial context. To address vanishing and exploding gradients in the training these networks, we propose a novel parametrisation exploiting the correspondence between the Lie group of unitary matrices and its Lie algebra.Next, a method for generating synthetic ICU time series data is described in the framework of adversarial networks. A core challenge for researchers in healthcare is the scarcity of shareable datasets on which to benchmark. Realistic synthetic data is therefore key. Novel methods for evaluating the quality of this synthetic data are proposed, and the model's privacy and memorisation properties are analysed, both heuristically and in terms of differential privacy.Finally, an ensemble of gradient-boosted decision trees are employed to identify circulatory system deterioration in Swiss ICU patients. As this system has been developed for deployment, we carefully detail the data processing steps, task specification, and evaluation considerations necessary for a real-world, real-time early warning system driven by machine learning.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.subjectgenerative models
dc.subjectintensive care
dc.subjectlanguage embedding
dc.subjectmachine learning
dc.subjectrecurrent neural networks
dc.titleRepresentation Learning On Sequential Medical Data
dc.typedissertation or thesis Biology and Medicine Cornell Graduate School of Medical Sciences of Philosophy

Files in this item


This item appears in the following Collection(s)

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International