Gaussian copula for mixed data with missing values: model estimation and imputation
No Access Until
Permanent Link(s)
Collections
Other Titles
Author(s)
Abstract
Missing data imputation forms the first critical step of many data analysis pipelines. For practical applications, imputation algorithms should produce imputations that match the true data distribution and handle data of mixed types. This dissertation develops new imputation algorithms for data with many different variable types, including continuous, binary, ordinal, and truncated and categorical values, by modeling data as samples from a Gaussian copula model. This semiparametric model learns the marginal distribution of each variable to match the empirical distribution, yet describes the interactions between variables with a joint Gaussian that enables fast inference, imputation with confidence intervals, and multiple imputation. This dissertation also develops specialized extensions to handle large datasets (with complexity linear in the number of observations) and streaming datasets (with online imputation).
Journal / Series
Volume & Issue
Description
Sponsorship
Date Issued
Publisher
Keywords
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Committee Co-Chair
Committee Member
Ning, Yang