Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
DigitalCollections@ILR
ILR School
  1. Home
  2. ILR School
  3. Centers, Institutes, Programs
  4. Labor Dynamics Institute
  5. NSF Census Research Network
  6. Presentations of the NCRN Coordinating Office
  7. Incorporating Conditionally Representative Auxiliary Information in Data Fusion

Incorporating Conditionally Representative Auxiliary Information in Data Fusion

File(s)
Maria de Yoreo - 20151007 - Fusion-Talk.pdf (1.65 MB)
Permanent Link(s)
https://hdl.handle.net/1813/50063
Collections
Presentations of the NCRN Coordinating Office
Author
De Yereo, Maria
Abstract

In data fusion analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions, which can lead to unreliable inferences if this assumption is not satisfied. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information (glue) on the dependence structure of variables not observed jointly. Using simulations, we illustrate the benefits of leveraging the information in glue. We also perform a data fusion experiment with the goal to fuse two surveys from the book publisher HarperCollins, using glue obtained from the Internet polling company CivicScience. Due to the convenience sampling nature of the auxiliary online survey, we find that the glue is not representative of the population sampled by HarperCollins. This is a scenario very likely to be encountered in practice, and points to the more general problem of combining information from multiple data sources that are not all probability samples of the same population. We discuss current work in this direction.

Description
Thanks to
-Coauthors: Bailey Fosdick (CSU) and Jerry Reiter (Duke)
-Working group members from SAMSI program on Computational Methods
in Social Sciences, 2013-2014
-HarperCollins Publishers
-CivicScience
Sponsorship
Research supported by the National Science Foundation under award SES-11-31897.
Date Issued
2015-10-07
Type
presentation

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance