Cornell University
Library
Cornell UniversityLibrary

eCommons

Help
Log In(current)
DigitalCollections@ILR
ILR School
  1. Home
  2. ILR School
  3. Centers, Institutes, Programs
  4. Labor Dynamics Institute
  5. NSF Census Research Network
  6. Cornell University NCRN node
  7. A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data

A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data

File(s)
Data_Appendix.zip (4.66 MB)
Data appendix (.zip)
schneider-abowd-2014-JRSS-A-revision.pdf (225.88 KB)
PDF Preprint
Permanent Link(s)
https://hdl.handle.net/1813/40828
Collections
Cornell University NCRN node
Author
Schneider, Matthew J.
Abowd, John M.
Abstract

Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureau’s Quarterly Workforce Indicators.

Sponsorship
The authors wish to acknowledge funding received through NSF grants BCS 0941226 (CDI), SES 9978093, ITR 0427889, SES 0922005, and SES 1131848 (NCRN).
Date Issued
2015
Keywords
synthetic data
•
zero-inflated mixed models
•
informative prior distributions
•
administrative data
•
statistical disclosure limitation (SDL)
•
empirical differential privacy
Related DOI
https://doi.org/10.1111/rssa.12100
Previously Published as
Published as Schneider, Matthew J. and John M. Abowd “A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data,” Journal of the Royal Statistical Society, Series A (2015) DOI:10.1111/rssa.12100.
ISSN
1467-985X
Type
preprint

Site Statistics | Help

About eCommons | Policies | Terms of use | Contact Us

copyright © 2002-2026 Cornell University Library | Privacy | Web Accessibility Assistance