ESSAYS ON DATA PRIVACY CHALLENGES THAT FEDERAL STATISICAL AGENCIES CONFRONT IN A DATA-RICH WORLD

Other Titles
Abstract
With vast databases at their disposal, private tech companies can compete with public statistical agencies to provide population statistics. However, private companies face different incentives to provide high-quality statistics and to protect the privacy of the people whose data are used. When both privacy protection and statistical accuracy are public goods, private providers tend to produce at least one suboptimally, but it is not clear which. In the first paper, we model a firm that publishes statistics under a guarantee of differential privacy. We prove that provision by the private firm results in inefficiently low data quality in this framework. When Google or the U.S. Census Bureau publish detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. In the second paper, we assert that to date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by computer scientists who are primarily interested in technical problems associated with protecting privacy. Economists should join the discussion, first, to determine where to balance privacy protection against data quality; a social choice problem. Furthermore, economists must ensure new privacy models preserve the validity of public data for economic research. Differential privacy is a mathematical tool for protecting the confidentiality of records belonging to individuals. One of the key premises of differential privacy is that any measurement based on the confidential data must be altered with carefully chosen random noise before publication. In the third paper, we consider a scenario where the deployment of differentially private disclosure limitation technologies by official statistical agencies may not always occur under ideal conditions. For instance, internal decisions or external requirements (e.g., legal or contractual obligations) may stipulate that certain statistics must be published exactly. Additionally, overlapping datasets may have already been published. In this paper, we explain (1) the semantics of algorithms that satisfy differential privacy, (2) how the semantics are affected by release of exact statistics (computed directly from the confidential data), (3) how to attribute responsibility for any resulting information leakage, (4) how to provide privacy semantics for the combined information leakage.
Journal / Series
Volume & Issue
Description
133 pages
Sponsorship
Date Issued
2020-05
Publisher
Keywords
Differential Privacy; Public Goods; Semantics; Social Choice
Location
Effective Date
Expiration Date
Sector
Employer
Union
Union Local
NAICS
Number of Workers
Committee Chair
Abowd, John
Committee Co-Chair
Committee Member
Easley, David
Shmatikov, Vitaly
Schmutte, Ian
Degree Discipline
Economics
Degree Name
Ph. D., Economics
Degree Level
Doctor of Philosophy
Related Version
Related DOI
Related To
Related Part
Based on Related Item
Has Other Format(s)
Part of Related Item
Related To
Related Publication(s)
Link(s) to Related Publication(s)
References
Link(s) to Reference(s)
Previously Published As
Government Document
ISBN
ISMN
ISSN
Other Identifiers
Rights
Rights URI
Types
dissertation or thesis
Accessibility Feature
Accessibility Hazard
Accessibility Summary
Link(s) to Catalog Record