Generalized Wavelet Thresholding: Estimation and Hypothesis Testing with Applications to Array Comparative Genomic Hybridization
Schifano, Elizabeth Danielle
Wavelets have gained considerable popularity within the statistical arena in the context of nonparametric regression. When modeling data of the form y = f + \epsilon, the objective is to estimate the unknown `true' function f with small risk, based on sampled data y contaminated with random (usually Gaussian) noise \epsilon. Wavelet shrinkage and thresholding techniques have proved to be quite effective in recovering the true function f, particularly when f is spatially inhomogeneous. Recently, Johnstone and Silverman (2005b) proposed using empirical Bayes methods for level-dependent threshold selection in wavelet shrinkage. Using the posterior median estimator, their approach amounts to a random thresholding procedure with impressive mean squared error (MSE) results. At each level, their approach considers a two-component mixture prior for each of the wavelet coefficients independently. This mixture prior inherently assumes that the wavelet coefficients are symmetrically distributed about zero. Depending on the choice of wavelet filter and the interesting attributes of the true function, it may be the case that neither the magnitude nor the number of positive coefficients are equal to the those of the negative coefficients. Inspired by the work of Zhang (2005) and Zhang et al. (2007), this thesis introduces a random generalized thresholding procedure in the wavelet domain that does not require the symmetry assumption; it uses a three-component mixture prior that handles the positive and negative coefficients separately. It is demonstrated that the proposed generalized wavelet thresholding procedure performs quite well when estimating f from a single sampled realization y. As in Johnstone and Silverman (2005b), the performance of the Maximal Overlap Discrete Wavelet Transform (MODWT) is substantially better than that of the standard Discrete Wavelet Transform (DWT) in terms of MSE and visual quality. An additional advantage for MODWT is that it is well-defined for any number of sampled points N, i.e., N need not be a power of two. The proposed procedure also performs well when estimating f from multiple noisy realizations y_i, i = 1,...,n. In most, if not all, of the shrinkage and generalized shrinkage techniques considered, the noise standard deviation is assumed to be known and constant across the length of the function. In reality, it is typically not known and must be estimated. In the single realization setting, the estimate is usually taken to be a constant based on the median absolute deviation of the empirical wavelet coefficients at the finest decomposition level. With multiple realizations, there are more estimation options available. Various estimation options for a constant variance are examined via simulation. The results indicate that three of the six estimates considered are reasonable choices. The case of heterogeneous variances across the length of the function is also briefly explored via simulation. Finally, an inferential procedure is proposed that first removes noise from individual observations via the generalized wavelet thresholding procedure, and then uses newly proposed F-like statistics (Cui et al., 2005; Hwang and Liu, 2006; Zhou, 2007) to compare populations of sampled observations. To demonstrate its applicability, the aforementioned statistical work is applied to datasets generated from Array Comparative Genomic Hybridization (aCGH) experiments.
Wavelets; aCGH; Generalized Thresholding; F-like tests
dissertation or thesis