Using Partially Synthetic Microdata to Protect Sensitive Cells in Business Statistics
No Access Until
We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).
Journal / Series
Volume & Issue
Presented at World Statistical Congress 2015 and Joint Statistical Meetings 2015.Vilhuber acknowledges support through NSF Grants SES-1042181 and BCS-0941226. All authors were affiliated with the U.S. Census Bureau, Center for Economic Studies, when originally contributing to the contents of this paper. This document reports the results of research and analysis undertaken by U.S. Census Bureau staff. It has undergone a Census Bureau review more limited in scope than that given to official Census Bureau publications. This document is released to inform interested parties of ongoing research and to encourage discussion of work in progress. All results have been reviewed to ensure that no confidential information is disclosed. The views expressed herein are attributable only to the authors and do not represent the views of the U.S. Census Bureau. The data used in this paper is restricted-access, and can be accessed either through the Federal Statistical Research Data Centers (LBD) or through the Synthetic Data Server at Cornell University (Synthetic LBD). Data and code used for the final version of this paper will be archived at the U.S. Census Bureau and made available upon request.
synthetic data; statistical disclosure limitation; time-series; local labor markets; gross job flows; confidentiality protection