Representation Learning On Sequential Medical Data

Rätsch, GunnarHyland, Stephanie2019-03-262019-08-082019https://hdl.handle.net/1813/64821The way we do medicine is undergoing a revolution driven by technology. As the modern drive to record, share, and analyse data sweeps across society, healthcare lies squarely in its path. Data generated by every-day clinical practice presents an invaluable view of health and disease at a scale previously unimaginable. However, to benefit it, we need computational tools to extract meaning, clinical insight, and actionable predictions. This new digital era of medicine is an opportunity not only for healthcare providers, but also for machine learning researchers to develop new methods tailored to the unique demands of this complex domain. The work described here sits in this sphere.Firstly, we explore representation learning for medical language. With its long-tailed distribution of technical terms, medical language necessitates development of methods to augment data-scarcity by exploiting prior information encoded in knowledge graphs. Obtaining semantically meaningful representations of medical concepts and their relationships is vital, and we describe a probabilistic model to learn such representations.Secondly, we address learning from and implicitly representing long time series using recurrent neural networks. These long sequences are commonplace in medicine, where one's health history is necessarily lengthy, but early events nonetheless provide crucial context. To address vanishing and exploding gradients in the training these networks, we propose a novel parametrisation exploiting the correspondence between the Lie group of unitary matrices and its Lie algebra.Next, a method for generating synthetic ICU time series data is described in the framework of adversarial networks. A core challenge for researchers in healthcare is the scarcity of shareable datasets on which to benchmark. Realistic synthetic data is therefore key. Novel methods for evaluating the quality of this synthetic data are proposed, and the model's privacy and memorisation properties are analysed, both heuristically and in terms of differential privacy.Finally, an ensemble of gradient-boosted decision trees are employed to identify circulatory system deterioration in Swiss ICU patients. As this system has been developed for deployment, we carefully detail the data processing steps, task specification, and evaluation considerations necessary for a real-world, real-time early warning system driven by machine learning.en-USAttribution-NonCommercial-NoDerivatives 4.0 Internationalgenerative modelsintensive carelanguage embeddingmachine learningrecurrent neural networksRepresentation Learning On Sequential Medical Datadissertation or thesis