Spatial and Temporal Approaches to Analyzing Big Data
Gelsinger, Megan Lynne
Modern technology allows researchers across disciplines to capture vast quantities of data -- ``big" data -- to help answer some of their most pressing scientific questions. The analysis of big data, though, is complicated by its computational feasibility and the availability of appropriate statistical techniques. These problems only grow in complexity when working with dependent data, such as data which is recorded across time (temporal dependence), or data which is recorded across space (spatial dependence). In this work, we offer tools to aid analysis of big data in both domains. In the temporal domain, we study a classification approach for high-dimensional times series data. The motivating dataset for this work comes from biological time series data simultaneously measured across many electrical frequencies. Our results suggest the plausibility of accurately classifying a variety of cell lines, thus providing researchers with a means of ``checking" the cell types under study prior to reporting any results. In the spatial domain, we present work for fitting statistical models to large spatial point pattern data. In particular, we combine spectral and Laplace approximations, among others, in an EM algorithm, denoted SLEM, to approximately fit the widely popular log-Gaussian Cox process to these big spatial datasets. Given the utility of SLEM, we are able to conduct a large-scale lightning dynamics study across the contiguous United States, where we make interesting observations about the relationship between lightning occurrence and several environmental covariates.
Guinness, Joe; Booth, James; Basu, Sumanta
Ph. D., Statistics
Doctor of Philosophy
dissertation or thesis