Gaussian copula for mixed data with missing values: model estimation and imputation
dc.contributor.author | Zhao, Yuxuan | |
dc.contributor.chair | Udell, Madeleine Richards | |
dc.contributor.committeeMember | Joachims, Thorsten | |
dc.contributor.committeeMember | Ning, Yang | |
dc.date.accessioned | 2022-09-15T15:51:46Z | |
dc.date.available | 2022-09-15T15:51:46Z | |
dc.date.issued | 2022-05 | |
dc.description | 188 pages | |
dc.description.abstract | Missing data imputation forms the first critical step of many data analysis pipelines. For practical applications, imputation algorithms should produce imputations that match the true data distribution and handle data of mixed types. This dissertation develops new imputation algorithms for data with many different variable types, including continuous, binary, ordinal, and truncated and categorical values, by modeling data as samples from a Gaussian copula model. This semiparametric model learns the marginal distribution of each variable to match the empirical distribution, yet describes the interactions between variables with a joint Gaussian that enables fast inference, imputation with confidence intervals, and multiple imputation. This dissertation also develops specialized extensions to handle large datasets (with complexity linear in the number of observations) and streaming datasets (with online imputation). | |
dc.identifier.doi | https://doi.org/10.7298/611b-8239 | |
dc.identifier.other | Zhao_cornellgrad_0058F_13067 | |
dc.identifier.other | http://dissertations.umi.com/cornellgrad:13067 | |
dc.identifier.uri | https://hdl.handle.net/1813/111829 | |
dc.language.iso | en | |
dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0/ | |
dc.subject | Gaussian copula | |
dc.subject | imputation | |
dc.subject | missing data | |
dc.subject | mixed data | |
dc.subject | ordinal data | |
dc.title | Gaussian copula for mixed data with missing values: model estimation and imputation | |
dc.type | dissertation or thesis | |
dcterms.license | https://hdl.handle.net/1813/59810.2 | |
thesis.degree.discipline | Statistics | |
thesis.degree.grantor | Cornell University | |
thesis.degree.level | Doctor of Philosophy | |
thesis.degree.name | Ph. D., Statistics |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Zhao_cornellgrad_0058F_13067.pdf
- Size:
- 1.86 MB
- Format:
- Adobe Portable Document Format