Disclosure Limitation and Confidentiality Protection in Linked Data
MetadataShow full item record
Abowd, John M.; Schmutte, Ian M.; Vilhuber, Lars
Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
The authors acknowledge the support of Alfred P. Sloan Foundation Grant G-2015-13903. Abowd acknowledges direct support from the U.S. Census Bureau (before and during his appointment as Associate Director). Abowd and Vilhuber acknowledge support through NSF Grants BCS- 0941226, TC-1012593. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the Census Bureau, NSF, or the Sloan Foundation. Most of the data referenced in the book chapter are not original to this publication. One graph is based on confidential data, and the data underlying it cannot easily be made public. All other data underlying figures and tables can be found at https://doi.org/10.5281/zenodo.1116994 .
Confidentiality protection; synthetic data; input noise infusion; formal privacy; LEHD; SIPP; HRS