Identification of Problem Opioid Use Risk from Electronic Health Records

Other Titles


A study cohort of 2733 chronic opioid therapy patients was extracted from the electronic health record system (EHR) of Weill Cornell Medicine over the period of 2000 to 2018. The case group of 422 patients in this cohort who had developed problem opioid use (POU), including opioid dependence, opioid misuse, and opioid abuse, was subsequently identified by parsing recorded ICD-9 and ICD-10 codes. We proceeded to extract 31420 potential risk factors from this structured EHR data then encoded them as a binary feature vector for each patient. Pearson’s Chi-square test was conducted on each potential risk factor between the case and control groups to identify a subset of 2860 potential risk factors with the highest probability of being correlated with an increased risk of developing POU. A logistic regression predicting the development of POU performed on this subset of risk factors achieved an area under the receiver operating characteristic curve (AUC) of 0.796. We then applied Recursive Feature Elimination to further reduce this set of risk factors to an optimal subset of 1150 features. A logistic regression performed on this optimal set of features achieved an AUC of 0.793. The features with the greatest positive coefficients in this regression model were mapped back to their respective concept domains specified by the Observational Medical Outcomes Partnership (OMOP) standardized vocabularies. The distribution of features among the concept domains indicates that the medical conditions suggested by a patient’s symptoms, the drugs prescribed to a patient, and the medical procedures ordered for a patient captured by the EHR can be leveraged to detect an increased risk of developing POU.

Journal / Series

Volume & Issue


29 pages


Date Issued





Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Estrin, Deborah

Committee Co-Chair

Committee Member

Azenkot, Shiri

Degree Discipline

Information Science

Degree Name

M.S., Information Science

Degree Level

Master of Science

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Rights URI


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record