Utility of two synthetic data sets mediated through a validation server: Experience with the Cornell Synthetic Data Server
The SDS at Cornell University was set up to provide early access to new synthetic data products by the U.S. Census Bureau. These datasets are made available to interested researchers in a controlled environment, prior to a more generalized release. Over the past 7 years, 4 synthetic datasets were made available on the server, and over 120 users have accessed the server over that time period. This paper reports on outcomes of the activity: results of validation requests from a user perspective, functioning of the feedback loop due to validation and user input, and the role of the SDS as a access gateway to and educational tool for other mechanisms of accessing detailed person, household, establishment, and firm statistics.
Presentation made at the Conference on Current Trends in Survey Statistics 2019 at the Institute for Mathematical Sciences, National University of Singapore, Singapore, 13 - 16 August, 2019
Vilhuber acknowledges funding through NSF Grants SES-1131848 and SES-1042181, and a grant from Alfred P. Sloan Grant (G-2015-13903).
confidentiality protection; synthetic data; longitudinal business microdata; validation server
Expanded version of https://hdl.handle.net/1813/43883
Attribution-NonCommercial 4.0 International
bookmarks; tagged PDF
The following license files are associated with this item:
Except where otherwise noted, this item's license is described as Attribution-NonCommercial 4.0 International