Presentation: Synthetic Data Generation for Firm Links
In most countries, national statistical agencies do not release establishment-level business microdata, because doing so represents too large a risk to establishments' confidentiality. Agencies potentially can manage these risks by releasing synthetic microdata, i.e., individual establishment records simulated from statistical models designed to mimic the joint distribution of the underlying observed data. Previously, we used this approach to generate a public-use version---now available for public use---of the U.S. Census Bureau's Longitudinal Business Database (LBD), a longitudinal census of establishments dating back to 1976. While the synthetic LBD has proven to be a useful product, we now seek to improve and expand it by using new synthesis models and adding features. This paper describes our efforts to create the second generation of the SynLBD, including synthesis procedures that we believe could be replicated in other contexts.
This presentation is part of the NCRN's Virtual Seminar series.
Presentation sponsored by NCRN Coordinating Office (NSF Grant 1507241). Kinney's work sponsored by Triangle Census Research Network (TCRN; NSF Grant 1131897)