Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
  1. Home
  2. Cornell University Graduate School
  3. Cornell Theses and Dissertations
  4. Estimation methods in the presence of outcome and mediator misclassification

Estimation methods in the presence of outcome and mediator misclassification

File(s)
Webb_cornellgrad_0058F_14390.pdf (1.47 MB)
Permanent Link(s)
https://doi.org/10.7298/jqee-jf96
https://hdl.handle.net/1813/116612
Collections
Cornell Theses and Dissertations
Author
Webb, Kimberly
Abstract

In health and social science association studies, binary variables may be subject to misclassification, resulting in substantial bias in effect estimates. While existing work in this area largely focuses on correcting for bias caused by misclassification through validation studies, I instead consider the problem in cases where a gold standard measure is not available— making validation studies impossible. In this dissertation, I propose statistical methods to recover unbiased parameter estimates in association studies with binary outcome misclassification, in multi-stage decision-making frameworks with noisy labels, and in mediation analyses with misclassified binary mediator variables. In the first project, I develop a Markov Chain Monte Carlo (MCMC) algorithm and an Estimation-Maximization (EM) algorithm for association studies with misclassified binary outcomes to estimate both (1) the unbiased association between the predictor and true outcome of interest and (2) the rate at which the observed outcomes were misclassified. In addition, I develop a "label switching correction" algorithm to select the appropriate parameter set from two that maximize this likelihood, relying only on the assumption that the sum of the outcome sensitivity and specificity is greater than one. I create an R software package, COMBO, to implement the proposed methods. Compared to models that ignore outcome misclassification or assume perfect sensitivity or specificity, I show through simulation studies that the estimates from my proposed MCMC and EM algorithm methods are less biased. In an example using data from the 2020 Medical Expenditure Panel Survey (MEPS), I apply the proposed methods to show that misdiagnosis of heart attacks can be modeled as a function of patient gender. In the second project, I extend the misclassified binary outcome model to study a specialized decision-making structure within the Virginia pretrial system. This system is characterized by a two-stage, sequential, and dependent decision-making framework. First, a pretrial risk assessment algorithm (the Virginia Pretrial Risk Assessment Instrument, or VPRAI) is used to assess the likelihood of "pretrial failure," the event where defendants either fail to appear for court or reoffend, for each defendant. Judicial officers, in turn, use these assessments to determine whether to release or detain defendants before trial. There is concern that both risk assessment algorithm recommendations and judge's pretrial decisions are biased against minority groups. I develop Bayesian and frequentist methods to investigate the association between various risk factors and pretrial failure, while simultaneously estimating misclassification rates of pretrial risk assessments and of judicial decisions as a function of defendant race. Using data from the Virginia Department of Criminal Justice Services, I estimate that the VPRAI has near-perfect specificity, but its sensitivity differs by defendant race. Judicial decisions also display evidence of bias; I estimate wrongful detention rates of 39.7% and 51.4% among white and Black defendants, respectively. In the third project, I consider mediation analysis settings where a binary mediator is misclassified. I develop a suite of analysis techniques including an ordinary least squares bias correction, a predictive value weighting method, and an EM algorithm to recover unbiased parameter estimates and to estimate misclassification rates for the mediator variable. Through simulation studies, I show that as misclassification rates increase, the proposed methods out-perform methods that ignore misclassification in terms of root mean squared error. I apply these methods to evaluate the role of gestational hypertension as a mediator in the association between maternal age and pre-term delivery.

Description
224 pages
Date Issued
2024-08
Keywords
Algorithmic bias
•
Association studies
•
Bias correction
•
Label switching
•
Misclassification
•
Noisy labels
Committee Chair
Wells, Martin
Committee Member
Booth, James
Thoemmes, Felix
Degree Discipline
Statistics
Degree Name
Ph. D., Statistics
Degree Level
Doctor of Philosophy
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights URI
https://creativecommons.org/licenses/by-nc-nd/4.0/
Type
dissertation or thesis
Link(s) to Catalog Record
https://newcatalog.library.cornell.edu/catalog/16611671

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance