Topics In Linear Models: Methods For Clustered, Censored Data And Two-Stage Sampling Designs
In this dissertation, we consider the use of linear models in the presence of clustered, right-censored failure time data. The semiparametric accelerated failure time model is a log-linear model which provides a useful, easy to interpret method for characterizing the relationship between failure time and covariates. Clustered failure time data can be handled in the context of the accelerated failure time model by using marginal estimation methods or by incorporating a random cluster-level frailty term. However, regression parameter estimation for both approaches requires the optimization of a non-smooth objective function. We use an extension of the induced smoothing procedure of Brown and Wang (2006) to construct a marginal estimation procedure that permits fast and accurate computation of regression parameter estimates and standard errors using widely available numerical methods. The regression parameter estimates are shown to be strongly consistent and asymptotically normal and, in addition, the asymptotic distribution of the smoothed estimator is shown to coincide with that obtained without the use of smoothing. In the case of the AFT frailty model, we use an extension of the induced smoothing procedure in conjunction with an EM-type algorithm to construct a procedure which permits simultaneous estimation of the regression parameters, the baseline cumulative hazard, and the parameter indexing a general frailty distribution. We also consider two-stage sampling designs for linear models. Epidemiological studies frequently involve an important risk factor which is difficult or expensive to measure. When the response variable and a collection of co- variates are easy to obtain on a large sample of the population, two-stage sampling designs provide a natural framework for using the easily obtained data to identify an informative subsample on which to collect the more difficult to measure covariate. We review traditional two-stage outcome-dependent sampling designs and develop a novel residual-dependent sampling design for this setting. Inverse probability weighted estimators for the sampling designs are presented and asymptotic properties of the estimators are discussed. The proposed residual-dependent sampling design is easy to implement and results in more efficient estimators than the outcome-dependent sampling design in many situations.
AFT models; clustered data; two-stage sampling designs
Strawderman, Robert Lee
Booth, James; Turnbull, Bruce William
Ph. D., Statistics
Doctor of Philosophy
dissertation or thesis