ESSAYS ON DATA PRIVACY CHALLENGES THAT FEDERAL STATISICAL AGENCIES CONFRONT IN A DATA-RICH WORLD
Sexton, William Nelson
With vast databases at their disposal, private tech companies can compete with public statistical agencies to provide population statistics. However, private companies face different incentives to provide high-quality statistics and to protect the privacy of the people whose data are used. When both privacy protection and statistical accuracy are public goods, private providers tend to produce at least one suboptimally, but it is not clear which. In the first paper, we model a firm that publishes statistics under a guarantee of differential privacy. We prove that provision by the private firm results in inefficiently low data quality in this framework. When Google or the U.S. Census Bureau publish detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. In the second paper, we assert that to date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by computer scientists who are primarily interested in technical problems associated with protecting privacy. Economists should join the discussion, first, to determine where to balance privacy protection against data quality; a social choice problem. Furthermore, economists must ensure new privacy models preserve the validity of public data for economic research. Differential privacy is a mathematical tool for protecting the confidentiality of records belonging to individuals. One of the key premises of differential privacy is that any measurement based on the confidential data must be altered with carefully chosen random noise before publication. In the third paper, we consider a scenario where the deployment of differentially private disclosure limitation technologies by official statistical agencies may not always occur under ideal conditions. For instance, internal decisions or external requirements (e.g., legal or contractual obligations) may stipulate that certain statistics must be published exactly. Additionally, overlapping datasets may have already been published. In this paper, we explain (1) the semantics of algorithms that satisfy differential privacy, (2) how the semantics are affected by release of exact statistics (computed directly from the confidential data), (3) how to attribute responsibility for any resulting information leakage, (4) how to provide privacy semantics for the combined information leakage.
Differential Privacy; Public Goods; Semantics; Social Choice
Easley, David; Shmatikov, Vitaly; Schmutte, Ian
Ph. D., Economics
Doctor of Philosophy
dissertation or thesis