Identification of Problem Opioid Use Risk from Electronic Health Records
A study cohort of 2733 chronic opioid therapy patients was extracted from the electronic health record system (EHR) of Weill Cornell Medicine over the period of 2000 to 2018. The case group of 422 patients in this cohort who had developed problem opioid use (POU), including opioid dependence, opioid misuse, and opioid abuse, was subsequently identified by parsing recorded ICD-9 and ICD-10 codes. We proceeded to extract 31420 potential risk factors from this structured EHR data then encoded them as a binary feature vector for each patient. Pearson’s Chi-square test was conducted on each potential risk factor between the case and control groups to identify a subset of 2860 potential risk factors with the highest probability of being correlated with an increased risk of developing POU. A logistic regression predicting the development of POU performed on this subset of risk factors achieved an area under the receiver operating characteristic curve (AUC) of 0.796. We then applied Recursive Feature Elimination to further reduce this set of risk factors to an optimal subset of 1150 features. A logistic regression performed on this optimal set of features achieved an AUC of 0.793. The features with the greatest positive coefficients in this regression model were mapped back to their respective concept domains specified by the Observational Medical Outcomes Partnership (OMOP) standardized vocabularies. The distribution of features among the concept domains indicates that the medical conditions suggested by a patient’s symptoms, the drugs prescribed to a patient, and the medical procedures ordered for a patient captured by the EHR can be leveraged to detect an increased risk of developing POU.