eCommons

DigitalCollections@ILR
ILR School
 

Stepping-up: The Census Bureau Tries to Be a Good Data Steward in the 21stCentury

Other Titles

Abstract

The Fundamental Law of Information Reconstruction, a.k.a. the Database Reconstruction Theorem, exposes a vulnerability in the way statistical agencies have traditionally published data. But it also exposes the same vulnerability for the way Amazon, Apple, Facebook, Google, Microsoft, Netflix, and other Internet giants publish data. We are all in this data-rich world together. And we all need to find solutions to the problem of how to publish information from these data while still providing meaningful privacy and confidentiality protections to the providers.

Fortunately for the American public, the Census Bureau's curation of their data is already regulated by a very strict law that mandates publication for statistical purposes only and in a manner that does not expose the data of any respondent--person, household or business--in a way that identifies that respondent as the source of specific data items. The Census Bureau has consistently interpreted that stricture on publishing identifiable data as governed by the laws of probability. An external user of Census Bureau publications should not be able to assert with reasonable certainty that particular data values were directly supplied by an identified respondent. Traditional methods of disclosure avoidance now fail because they are not able to formalize and quantify that risk. Moreover, when traditional methods are assessed using current tools, the relative certainty with which specific values can be associated with identifiable individuals turns out to be orders of magnitude greater than anticipated at the time the data were released.

In light of these developments, the Census Bureau has committed to an open and transparent modernization of its data publishing systems using formal methods like differential privacy. The intention is to demonstrate that statistical data, fit for their intended uses, can be produced when the entire publication system is subject to a formal privacy-loss budget.

To date, the team developing these systems--many of whom are in this room--has demonstrated that bounded \epsilon-differential privacy can be implemented for the data publications from the 2020 Census used to re-draw every legislative district in the nation (PL94-171 tables). That team has also developed methods for quantifying and displaying the system-wide trade-offs between the accuracy of those data and the privacy-loss budget assigned to the tabulations. Considering that work began in mid-2016 and that no organization anywhere in the world has yet deployed a full, central differential privacy system, this is already a monumental achievement.

But it is only the tip of the iceberg in terms of the statistical products historically produced from a decennial census. Demographic profiles, based on the detailed tables traditionally published in summary files following the publication of redistricting data, have far more diverse uses than the redistricting data. Summarizing those use cases in a set of queries that can be answered with a reasonable privacy-loss budget is the next challenge. Internet giants, businesses and statistical agencies around the world should also step-up to these challenges. We can learn from, and help, each other enormously.

Journal / Series

Volume & Issue

Description

Presented at the Simons Institute Workshop "Data Privacy: From Foundations to Applications." Program available here: https://simons.berkeley.edu/workshops/schedule/6281

Sponsorship

Date Issued

2019-03-04

Publisher

Keywords

statistics; privacy; stewardship

Location

Effective Date

Expiration Date

Sector

Employer

Union

Union Local

NAICS

Number of Workers

Committee Chair

Committee Co-Chair

Committee Member

Degree Discipline

Degree Name

Degree Level

Related Version

Video recording available at https://www.youtube.com/watch?v=yUyCYC6rb_4

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)

References

Link(s) to Reference(s)

Previously Published As

Government Document

ISBN

ISMN

ISSN

Other Identifiers

Rights

Attribution-NonCommercial-ShareAlike 4.0 International

Types

presentation

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record