eCommons

DigitalCollections@ILR
ILR School
 

Labor Dynamics Institute Publications

Permanent URI for this collection

The Labor Dynamics Institute's mission is to create and make accessible novel data on the dynamics of the labor markets. We work with research networks and statistical agencies, developing appropriate statistics to inform policy makers, researchers, and simply people seeking knowledge. For more information, visit our website.

Browse

Recent Submissions

Now showing 1 - 10 of 37
  • Item
    Protecting Confidential Data through Non-Statistical Methods
    Vilhuber, Lars (Chapman and Hall/CRC, 2024-10-09)
    This chapter will rely on and update previous overviews of how researchers, citizens, and administrators can reliably and securely access confidential data, i.e., data that cannot be simply published as “open data”. I will discuss various legal, technical, and practical ways of securing access to data that is needed for computations. This obviously depends on the type and complexity of the computations but also depends on the who, how, and where access is needed.
  • Item
    Reproducibility and Transparency versus Privacy and Confidentiality: Reflections from a Data Editor
    Vilhuber, Lars (Journal of Econometrics, 2023)
    Transparency and reproducibility are often seen in opposition to privacy and confidentiality. Data that need to be kept confidential are seen as an impediment to reproducibility, and privacy would seem to inhibit transparency. I bring a more nuanced view to the discussion, and show, using examples from over 1,000 reproducibility assessments, that confidential data can very well be used in reproducible and transparent research. The key insight is that access to most confidential data, while tedious, is open to hundreds if not thousands of researchers. In cases where few researchers can consider accessing such data in the future, reproducibility services, such as those provided by some journals, can provide some evidence for effective reproducibility
  • Item
    Opportunities for Enhanced Employer Administrative Records to Improve Research, Statistics, and Evaluation
    Groshen, Erica L.; Nightingale, Demetra; Reamer, Andrew; Magdy, Youstina; Raju, Madison (2022-11)
    The Jobs and Employment Data Exchange (JEDx) is an initiative of the US Chamber of Commerce Foundation to enhance the administrative data systems of employers’ earnings and employment records to the mutual gain of business and government. Through this report, the JEDx Research Enhancement Project (JEDx-REP) provides perspectives on the potential benefits of enhanced earnings and employment records for social science research, official statistics, and evaluation. The JEDx-REP team conducted literature reviews, interviews, and advisor forums to prepare findings and recommendations regarding research, statistical, and evaluation use cases; priority data enhancements (such as occupation, hours worked, and primary work location); and models and options for data access systems to enable these applications while protecting privacy and confidentiality.
  • Item
    An Interview with John M. Abowd
    Schmutte, Ian; Vilhuber, Lars (WIley, 2022-02-20)
    John M. Abowd is the Chief Scientist and Associate Director for Research and Methodology, U.S. Census Bureau. He completed his A.B. in Economics at NotreDame in 1973 and his Ph.D. in Economics at University of Chicago in 1977 under Arnold Zellner. During his academic career, John has held faculty positions at Princeton, the University of Chicago, and, since 1987 at Cornell University where he is the Edmund Ezra Day Professor Emeritus of Economics, Statistics and Data Science. John was trained as a statistician and labor economist, and his economic research has focused on the rigorous empirical evaluation of labor market institutions. In the late 1990s, he began working with the Census Bureau on projects that would end up leveraging administrative and survey records into official statistical products. Through that work, he has developed a research agenda focused on issues necessary to generate those products, including data privacy, synthetic data, total error analysis, data linkage, missing data problems, among others.
  • Item
    Why the Economics Profession Must Actively Participate in the Privacy Protection Debate
    Abowd, John M.; Schmutte, Ian M.; Sexton, William; Vilhuber, Lars (2019-05-01)
    When Google or the U.S. Census Bureau publish detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. To date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by computer scientists who are primarily interested in technical problems associated with protecting privacy. Economists should join the discussion, first, to determine where to balance privacy protection against data quality; a social choice problem. Furthermore, economists must ensure new privacy models preserve the validity of public data for economic research.
  • Item
    metajelo: A Metadata Package for Journals to Support External Linked Objects
    Lagoze, Carl; Vilhuber, Lars (2019-04-11)
    We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque.
  • Item
    The U.S. Census Bureau Adopts Differential Privacy
    Abowd, John M. (2018-08-01)
    The U.S. Census Bureau announced, via its Scientific Advisory Committee, that it would protect the publications of the 2018 End-to-End Census Test (E2E) using differential privacy. The E2E test is a dress rehearsal for the 2020 Census, the constitutionally mandated enumeration of the population used to reapportion the House of Representatives and redraw every legislative district in the country. Systems that perform successfully in the E2E test are then used in the production of the 2020 Census. Motivation: The Census Bureau conducted internal research that confirmed that the statistical disclosure limitation systems used for the 2000 and 2010 Censuses had serious vulnerabilities that were exposed by the Dinur and Nissim (2003) database reconstruction theorem. We designed a differentially private publication system that directly addressed these vulnerabilities while preserving the fitness for use of the core statistical products. Problem statement: Designing and engineering production differential privacy systems requires two primary components: (1) inventing and constructing algorithms that deliver maximum accuracy for a given privacy-loss budget and (2) insuring that the privacy-loss budget can be directly controlled by the policy-makers who must choose an appropriate point on the accuracy-privacy-loss tradeoff. The first problem lies in the domain of computer science. The second lies in the domain of economics. Approach: The algorithms under development for the 2020 Census focus on the data used to draw legislative districts and to enforce the 1965 Voting Rights Act (VRA). These algorithms efficiently distribute the noise injected by differential privacy. The Data Stewardship Executive Policy Committee selects the privacy-loss parameter after reviewing accuracy-privacy-loss graphs.
  • Item
    Understanding Database Reconstruction Attacks on Public Data
    Garfinkel, Simson L.; Abowd, John M.; Martindale, Christian (2018-01-01)
    In 2020 the U.S. Census Bureau will conduct the Constitutionally mandated decennial Census of Population and Housing. Because a census involves collecting large amounts of private data under the promise of confidentiality, traditionally statistics are published only at high levels of aggregation. Published statistical tables are vulnerable to DRAs (database reconstruction attacks), in which the underlying microdata is recovered merely by finding a set of microdata that is consistent with the published statistical tabulations. A DRA can be performed by using the tables to create a set of mathematical constraints and then solving the resulting set of simultaneous equations. This article shows how such an attack can be addressed by adding noise to the published tabulations, so that the reconstruction no longer results in the original data.
  • Item
    Disclosure Limitation and Confidentiality Protection in Linked Data
    Abowd, John M.; Schmutte, Ian M.; Vilhuber, Lars (2018-01-01)
    Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.
  • Item
    Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnThe Map
    McKinney, Kevin L.; Green, Andrew; Vilhuber, Lars; Abowd, John (2017-12-16)
    We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and finite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.