Information Recovery With Missing Data When Outcomes Are Right Censored.
This dissertation focuses on utilizing information more efficiently in several settings when some observations are right-censored using the semiparametric efficiency theory developed in Robins et al. (1994). Chapter 2 focuses on estimation of the regression parameter in the semiparametric accelerated failure time model when the data is collected using a case-cohort design. The previously proposed methods of estimation use some form of HorvitzThompsons estimators which are known to be inefficient and the main aim of Chapter 2 is to improve efficiency of estimation of the regression parameter for the accelerated failure time model for case-cohort studies. We derive the semiparametric information bound and propose a more practical class of augmented estimators motivated by the augmentation theory developed in Robins et al. (1994). We develop large sample properties, identify the most efficient estimator within the class of augmented estimators, and give practical guidance on how to calculate the estimator. Regression trees are non-parametric methods that use reduction in loss to partition the covariate space into binary partitions creating a prediction model that is easily interpreted and visualized. When some observations are censored the full data loss function is not a function of the observed data and Molinaro et al. (2004) used inverse probability weighted estimators to extend the loss functions to right-censored outcomes. Motivated by semiparametric efficiency theory Chapter 3 extends the approach in Molinaro et al. (2004) by using doubly robust loss function that utilize information on censored observations better in addition to being more robust to the modeling choices that need to be made. Regression trees are known to suffer from instability with minor changes in the data sometimes resulting in very different trees. Ensemble based methods that average several trees have been shown to lead to prediction models that usually have smaller prediction error. One such ensemble based method is random forests Breiman (2001) and in Chapter 4 we use the regression tree methodology developed in Chapter 3 as building blocks to random forests.
Missing Data; Semiparametric Theory; Censored Data
Wells,Martin Timothy; Ruppert,David
Ph.D. of Statistics
Doctor of Philosophy
dissertation or thesis