eCommons

 

Gaussian copula for mixed data with missing values: model estimation and imputation

Other Titles

Abstract

Missing data imputation forms the first critical step of many data analysis pipelines. For practical applications, imputation algorithms should produce imputations that match the true data distribution and handle data of mixed types. This dissertation develops new imputation algorithms for data with many different variable types, including continuous, binary, ordinal, and truncated and categorical values, by modeling data as samples from a Gaussian copula model. This semiparametric model learns the marginal distribution of each variable to match the empirical distribution, yet describes the interactions between variables with a joint Gaussian that enables fast inference, imputation with confidence intervals, and multiple imputation. This dissertation also develops specialized extensions to handle large datasets (with complexity linear in the number of observations) and streaming datasets (with online imputation).

Journal / Series

Volume & Issue

Description

188 pages

Sponsorship

Date Issued

2022-05

Publisher

Keywords

Gaussian copula; imputation; missing data; mixed data; ordinal data

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Udell, Madeleine Richards

Committee Co-Chair

Committee Member

Joachims, Thorsten
Ning, Yang

Degree Discipline

Statistics

Degree Name

Ph. D., Statistics

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Attribution-NonCommercial-ShareAlike 4.0 International

Types

dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record