Representation Learning On Sequential Medical Data

Hyland, Stephanie

Representation Learning On Sequential Medical Data

Files

2019-HYLAND-REPRESENTATION_LEARNING_ON_SEQUENTIAL_MEDICAL_DATA.pdf (10.37 MB)

Permanent Link(s)

https://hdl.handle.net/1813/64821

Collections

Weill Cornell Theses and Dissertations

Full item page

Author(s)

Hyland, Stephanie

Abstract

The way we do medicine is undergoing a revolution driven by technology. As the modern drive to record, share, and analyse data sweeps across society, healthcare lies squarely in its path. Data generated by every-day clinical practice presents an invaluable view of health and disease at a scale previously unimaginable. However, to benefit it, we need computational tools to extract meaning, clinical insight, and actionable predictions. This new digital era of medicine is an opportunity not only for healthcare providers, but also for machine learning researchers to develop new methods tailored to the unique demands of this complex domain. The work described here sits in this sphere.Firstly, we explore representation learning for medical language. With its long-tailed distribution of technical terms, medical language necessitates development of methods to augment data-scarcity by exploiting prior information encoded in knowledge graphs. Obtaining semantically meaningful representations of medical concepts and their relationships is vital, and we describe a probabilistic model to learn such representations.Secondly, we address learning from and implicitly representing long time series using recurrent neural networks. These long sequences are commonplace in medicine, where one's health history is necessarily lengthy, but early events nonetheless provide crucial context. To address vanishing and exploding gradients in the training these networks, we propose a novel parametrisation exploiting the correspondence between the Lie group of unitary matrices and its Lie algebra.Next, a method for generating synthetic ICU time series data is described in the framework of adversarial networks. A core challenge for researchers in healthcare is the scarcity of shareable datasets on which to benchmark. Realistic synthetic data is therefore key. Novel methods for evaluating the quality of this synthetic data are proposed, and the model's privacy and memorisation properties are analysed, both heuristically and in terms of differential privacy.Finally, an ensemble of gradient-boosted decision trees are employed to identify circulatory system deterioration in Swiss ICU patients. As this system has been developed for deployment, we carefully detail the data processing steps, task specification, and evaluation considerations necessary for a real-world, real-time early warning system driven by machine learning.

Date Issued

2019

Keywords

generative models; intensive care; language embedding; machine learning; recurrent neural networks

Degree Discipline

Computational Biology and Medicine

Degree Level

Doctor of Philosophy

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights URI

https://creativecommons.org/licenses/by-nc-nd/4.0/

Types

dissertation or thesis

Representation Learning On Sequential Medical Data

Files

No Access Until

Permanent Link(s)

Collections

Other Titles

Author(s)

Abstract

Journal / Series

Volume & Issue

Description

Sponsorship

Date Issued

Publisher

Keywords

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Rights URI

Types

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record