Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Weill Cornell Medicine
  3. Weill Cornell Theses and Dissertations
  4. Weill Cornell Theses and Dissertations
  5. Representation Learning On Sequential Medical Data

Representation Learning On Sequential Medical Data

File(s)
2019-HYLAND-REPRESENTATION_LEARNING_ON_SEQUENTIAL_MEDICAL_DATA.pdf (10.37 MB)
Permanent Link(s)
https://hdl.handle.net/1813/64821
Collections
Weill Cornell Theses and Dissertations
Author
Hyland, Stephanie
Abstract

The way we do medicine is undergoing a revolution driven by technology. As the modern drive to record, share, and analyse data sweeps across society, healthcare lies squarely in its path. Data generated by every-day clinical practice presents an invaluable view of health and disease at a scale previously unimaginable. However, to benefit it, we need computational tools to extract meaning, clinical insight, and actionable predictions. This new digital era of medicine is an opportunity not only for healthcare providers, but also for machine learning researchers to develop new methods tailored to the unique demands of this complex domain. The work described here sits in this sphere.Firstly, we explore representation learning for medical language. With its long-tailed distribution of technical terms, medical language necessitates development of methods to augment data-scarcity by exploiting prior information encoded in knowledge graphs. Obtaining semantically meaningful representations of medical concepts and their relationships is vital, and we describe a probabilistic model to learn such representations.Secondly, we address learning from and implicitly representing long time series using recurrent neural networks. These long sequences are commonplace in medicine, where one's health history is necessarily lengthy, but early events nonetheless provide crucial context. To address vanishing and exploding gradients in the training these networks, we propose a novel parametrisation exploiting the correspondence between the Lie group of unitary matrices and its Lie algebra.Next, a method for generating synthetic ICU time series data is described in the framework of adversarial networks. A core challenge for researchers in healthcare is the scarcity of shareable datasets on which to benchmark. Realistic synthetic data is therefore key. Novel methods for evaluating the quality of this synthetic data are proposed, and the model's privacy and memorisation properties are analysed, both heuristically and in terms of differential privacy.Finally, an ensemble of gradient-boosted decision trees are employed to identify circulatory system deterioration in Swiss ICU patients. As this system has been developed for deployment, we carefully detail the data processing steps, task specification, and evaluation considerations necessary for a real-world, real-time early warning system driven by machine learning.

Date Issued
2019
Keywords
generative models
•
intensive care
•
language embedding
•
machine learning
•
recurrent neural networks
Degree Discipline
Computational Biology and Medicine
Degree Level
Doctor of Philosophy
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights URI
https://creativecommons.org/licenses/by-nc-nd/4.0/
Type
dissertation or thesis

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance