Labor Dynamics Institute Publications
Permanent URI for this collection
The Labor Dynamics Institute's mission is to create and make accessible novel data on the dynamics of the labor markets. We work with research networks and statistical agencies, developing appropriate statistics to inform policy makers, researchers, and simply people seeking knowledge. For more information, visit our website.
Browse
Recent Submissions
Item Reproducibility and Transparency versus Privacy and Confidentiality: Reflections from a Data EditorVilhuber, Lars (Journal of Econometrics, 2023)Transparency and reproducibility are often seen in opposition to privacy and confidentiality. Data that need to be kept confidential are seen as an impediment to reproducibility, and privacy would seem to inhibit transparency. I bring a more nuanced view to the discussion, and show, using examples from over 1,000 reproducibility assessments, that confidential data can very well be used in reproducible and transparent research. The key insight is that access to most confidential data, while tedious, is open to hundreds if not thousands of researchers. In cases where few researchers can consider accessing such data in the future, reproducibility services, such as those provided by some journals, can provide some evidence for effective reproducibilityItem Opportunities for Enhanced Employer Administrative Records to Improve Research, Statistics, and EvaluationGroshen, Erica L.; Nightingale, Demetra; Reamer, Andrew; Magdy, Youstina; Raju, Madison (2022-11)The Jobs and Employment Data Exchange (JEDx) is an initiative of the US Chamber of Commerce Foundation to enhance the administrative data systems of employers’ earnings and employment records to the mutual gain of business and government. Through this report, the JEDx Research Enhancement Project (JEDx-REP) provides perspectives on the potential benefits of enhanced earnings and employment records for social science research, official statistics, and evaluation. The JEDx-REP team conducted literature reviews, interviews, and advisor forums to prepare findings and recommendations regarding research, statistical, and evaluation use cases; priority data enhancements (such as occupation, hours worked, and primary work location); and models and options for data access systems to enable these applications while protecting privacy and confidentiality.Item An Interview with John M. AbowdSchmutte, Ian; Vilhuber, Lars (WIley, 2022-02-20)John M. Abowd is the Chief Scientist and Associate Director for Research and Methodology, U.S. Census Bureau. He completed his A.B. in Economics at NotreDame in 1973 and his Ph.D. in Economics at University of Chicago in 1977 under Arnold Zellner. During his academic career, John has held faculty positions at Princeton, the University of Chicago, and, since 1987 at Cornell University where he is the Edmund Ezra Day Professor Emeritus of Economics, Statistics and Data Science. John was trained as a statistician and labor economist, and his economic research has focused on the rigorous empirical evaluation of labor market institutions. In the late 1990s, he began working with the Census Bureau on projects that would end up leveraging administrative and survey records into official statistical products. Through that work, he has developed a research agenda focused on issues necessary to generate those products, including data privacy, synthetic data, total error analysis, data linkage, missing data problems, among others.Item Why the Economics Profession Must Actively Participate in the Privacy Protection DebateAbowd, John M.; Schmutte, Ian M.; Sexton, William; Vilhuber, Lars (2019-05-01)When Google or the U.S. Census Bureau publish detailed statistics on browsing habits or neighborhood characteristics, some privacy is lost for everybody while supplying public information. To date, economists have not focused on the privacy loss inherent in data publication. In their stead, these issues have been advanced almost exclusively by computer scientists who are primarily interested in technical problems associated with protecting privacy. Economists should join the discussion, first, to determine where to balance privacy protection against data quality; a social choice problem. Furthermore, economists must ensure new privacy models preserve the validity of public data for economic research.Item metajelo: A Metadata Package for Journals to Support External Linked ObjectsLagoze, Carl; Vilhuber, Lars (2019-04-11)We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque.Item The U.S. Census Bureau Adopts Differential PrivacyAbowd, John M. (2018-08-01)The U.S. Census Bureau announced, via its Scientific Advisory Committee, that it would protect the publications of the 2018 End-to-End Census Test (E2E) using differential privacy. The E2E test is a dress rehearsal for the 2020 Census, the constitutionally mandated enumeration of the population used to reapportion the House of Representatives and redraw every legislative district in the country. Systems that perform successfully in the E2E test are then used in the production of the 2020 Census. Motivation: The Census Bureau conducted internal research that confirmed that the statistical disclosure limitation systems used for the 2000 and 2010 Censuses had serious vulnerabilities that were exposed by the Dinur and Nissim (2003) database reconstruction theorem. We designed a differentially private publication system that directly addressed these vulnerabilities while preserving the fitness for use of the core statistical products. Problem statement: Designing and engineering production differential privacy systems requires two primary components: (1) inventing and constructing algorithms that deliver maximum accuracy for a given privacy-loss budget and (2) insuring that the privacy-loss budget can be directly controlled by the policy-makers who must choose an appropriate point on the accuracy-privacy-loss tradeoff. The first problem lies in the domain of computer science. The second lies in the domain of economics. Approach: The algorithms under development for the 2020 Census focus on the data used to draw legislative districts and to enforce the 1965 Voting Rights Act (VRA). These algorithms efficiently distribute the noise injected by differential privacy. The Data Stewardship Executive Policy Committee selects the privacy-loss parameter after reviewing accuracy-privacy-loss graphs.Item Understanding Database Reconstruction Attacks on Public DataGarfinkel, Simson L.; Abowd, John M.; Martindale, Christian (2018-01-01)In 2020 the U.S. Census Bureau will conduct the Constitutionally mandated decennial Census of Population and Housing. Because a census involves collecting large amounts of private data under the promise of confidentiality, traditionally statistics are published only at high levels of aggregation. Published statistical tables are vulnerable to DRAs (database reconstruction attacks), in which the underlying microdata is recovered merely by finding a set of microdata that is consistent with the published statistical tabulations. A DRA can be performed by using the tables to create a set of mathematical constraints and then solving the resulting set of simultaneous equations. This article shows how such an attack can be addressed by adding noise to the published tabulations, so that the reconstruction no longer results in the original data.Item Disclosure Limitation and Confidentiality Protection in Linked DataAbowd, John M.; Schmutte, Ian M.; Vilhuber, Lars (2018-01-01)Confidentiality protection for linked administrative data is a combination of access modalities and statistical disclosure limitation. We review traditional statistical disclosure limitation methods and newer methods based on synthetic data, input noise infusion and formal privacy. We discuss how these methods are integrated with access modalities by providing three detailed examples. The first example is the linkages in the Health and Retirement Study to Social Security Administration data. The second example is the linkage of the Survey of Income and Program Participation to administrative data from the Internal Revenue Service and the Social Security Administration. The third example is the Longitudinal Employer-Household Dynamics data, which links state unemployment insurance records for workers and firms to a wide variety of censuses and surveys at the U.S. Census Bureau. For examples, we discuss access modalities, disclosure limitation methods, the effectiveness of those methods, and the resulting analytical validity. The final sections discuss recent advances in access modalities for linked administrative data.Item Total Error and Variability Measures with Integrated Disclosure Limitation for Quarterly Workforce Indicators and LEHD Origin Destination Employment Statistics in OnThe MapMcKinney, Kevin L.; Green, Andrew; Vilhuber, Lars; Abowd, John (2017-12-16)We report results from the first comprehensive total quality evaluation of five major indicators in the U.S. Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators (QWI): total employment, beginning-of-quarter employment, full-quarter employment, total payroll, and average monthly earnings of full-quarter employees. Beginning-of-quarter employment is also the main tabulation variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace reports as displayed in OnTheMap (OTM). The evaluation is conducted by generating multiple threads of the edit and imputation models used in the LEHD Infrastructure File System. These threads conform to the Rubin (1987) multiple imputation model, with each thread or implicate being the output of formal probability models that address coverage, edit, and imputation errors. Design-based sampling variability and finite population corrections are also included in the evaluation. We derive special formulas for the Rubin total variability and its components that are consistent with the disclosure avoidance system used for QWI and LODES/OTM workplace reports. These formulas allow us to publish the complete set of detailed total quality measures for QWI and LODES. The analysis reveals that the five publication variables under study are estimated very accurately for tabulations involving at least 10 jobs. Tabulations involving three to nine jobs have quality in the range generally deemed acceptable. Tabulations involving zero, one or two jobs, which are generally suppressed in the QWI and synthesized in LODES, have substantial total variability but their publication in LODES allows the formation of larger custom aggregations, which will in general have the accuracy estimated for tabulations in the QWI based on a similar number of workers.Item Proceedings from the 2017 Cornell-Census- NSF-Sloan Workshop on Practical PrivacyVilhuber, Lars; Schmutte, Ian M. (2017-09-20)These proceedings report on a workshop hosted at the U.S. Census Bureau on May 8, 2017. Our purpose was to gather experts from various backgrounds together to continue discussing the development of formal privacy systems for Census Bureau data products. This workshop was a successor to a previous workshop held in October 2016 (Vilhuber & Schmutte 2017). At our prior workshop, we hosted computer scientists, survey statisticians, and economists, all of whom were experts in data privacy. At that time we discussed the practical implementation of cutting-edge methods for publishing data with formal, provable privacy guarantees, with a focus on applications to Census Bureau data products. The teams developing those applications were just starting out when our first workshop took place, and we spent our time brainstorming solutions to the various problems researchers were encountering, or anticipated encountering. For these cutting-edge formal privacy models, there had been very little effort in the academic literature to apply those methods in real-world settings with large, messy data. We therefore brought together an expanded group of specialists from academia and government who could shed light on technical challenges, subject matter challenges and address how data users might react to changes in data availability and publishing standards. In May 2017, we organized a follow-up workshop, which these proceedings report on. We reviewed progress made in four different areas. The four topics discussed as part of the workshop were 1. the 2020 Decennial Census; 2. the American Community Survey (ACS); 3. the 2017 Economic Census; 4. measuring the demand for privacy and for data quality. As in our earlier workshop, our goals were to 1. Discuss the specific challenges that have arisen in ongoing efforts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers; 2. Produce short written memos that summarize concrete suggestions for practical applications to specific Census Bureau priority areas.Item Proceedings from the 2016 NSF–Sloan Workshop on Practical PrivacyVilhuber, Lars; Schmutte, Ian M. (2017-01-24)On October 14, 2016, we hosted a workshop that brought together economists, survey statisticians, and computer scientists with expertise in the field of privacy preserving methods: Census Bureau staff working on implementing cutting-edge methods in the Bureau’s flagship public-use products mingled with academic researchers from a variety of universities. The four products discussed as part of the workshop were 1. the American Community Survey (ACS); 2. Longitudinal Employer-Household Data (LEHD), in particular the LEHD Origin-Destination Employment Statistics (LODES); the 3. 2020 Decennial Census; and the 4. 2017 Economic Census. The goal of the workshop was to 1. Discuss the specific challenges that have arisen in ongoing efforts to apply formal privacy models to Census data products by drawing together expertise of academic and governmental researchers 2. Produce short written memos that summarize concrete suggestions for practical applications to specific Census Bureau priority areas.Item Modeling Endogenous Mobility in Earnings DeterminationAbowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. (2017-01-01)We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates.Item Understanding the Effect of Procedural Justice on Psychological DistressCloutier, Julie; Vilhuber, Lars; Harrison, Denis; Béland-Ouellette, Vanessa (2017-01-01)Studies on the effect of procedural justice on psychological distress present conflicting results. Drawing on instrumental and relational perspectives of justice, we test the hypothesis that the perception of procedural justice influences the level of workers’ psychological distress. Using a number of validated instruments to collected data from 659 workers in three call centers, we use OLS regressions and Hayes’ PROCESS tool to show that the perception of procedural justice has a direct, unique, and independent effect on psychological distress. The perception of procedural justice has no instrumental role, the key mechanism being the relational role, suggesting that perceived injustice influences psychological distress because it threatens self-esteem. Distributive justice perceptions (recognition, promotions, job security) are not associated with psychological distress, calling into question Siegrist’s model. Our findings suggest that perceived procedural justice provides workers better evidence of the extent to which they are valued and appreciated members of their organizations than do perceptions of distributive justice. The results highlight the greater need for workers to be valued and appreciated for who they are (consideration and esteem), rather than for what they do for their organization (distributive justice of rewards).Item Estimating Compensating Wage Differentials with Endogenous Job MobilityLavetti, Kurt; Schmutte, Ian M. (2016-08-12)We demonstrate a strategy for using matched employer-employee data to correct endogenous job mobility bias when estimating compensating wage differentials. Applied to fatality rates in the census of formal-sector jobs in Brazil between 2003-2010, we show why common approaches to eliminating ability bias can greatly amplify endogenous job mobility bias. By extending the search-theoretic hedonic wage framework, we establish conditions necessary to interpret our estimates as preferences. We present empirical analyses supporting the predictions of the model and identifying conditions, demonstrating that the standard models are misspecified, and that our proposed model eliminates latent ability and endogenous mobility biasesItem Using Partially Synthetic Microdata to Protect Sensitive Cells in Business StatisticsMiranda, Javier; Vilhuber, Lars (2015-12-21)We describe and analyze a method that blends records from both observed and synthetic microdata into public-use tabulations on establishment statistics. The resulting tables use synthetic data only in potentially sensitive cells. We describe different algorithms, and present preliminary results when applied to the Census Bureau's Business Dynamics Statistics and Synthetic Longitudinal Business Database, highlighting accuracy and protection afforded by the method when compared to existing public-use tabulations (with suppressions).Item Sorting between and within Industries: A Testable Model of Assortative MatchingAbowd, John M.; Kramarz, Francis; Perez-Duarte, Sebastien; Schmutte, Ian M. (2014-07-31)We test Shimer's (2005) theory of the sorting of workers between and within industrial sectors based on directed search with coordination frictions, deliberately maintaining its static general equilibrium framework. We fit the model to sector-specific wage, vacancy and output data, including publicly-available statistics that characterize the distribution of worker and employer wage heterogeneity across sectors. Our empirical method is general and can be applied to a broad class of assignment models. The results indicate that industries are the loci of sorting-more productive workers are employed in more productive industries. The evidence confirms that strong assortative matching can be present even when worker and employer components of wage heterogeneity are weakly correlated.Item Modeling Endogenous Mobility in Wage DeterminationAbowd, John M.; McKinney, Kevin L.; Schmutte, Ian M. (2015-05-01)We evaluate the bias from endogenous job mobility in fixed-effects estimates of worker- and firm-specific earnings heterogeneity using longitudinally linked employer-employee data from the LEHD infrastructure file system of the U.S. Census Bureau. First, we propose two new residual diagnostic tests of the assumption that mobility is exogenous to unmodeled determinants of earnings. Both tests reject exogenous mobility. We relax the exogenous mobility assumptions by modeling the evolution of the matched data as an evolving bipartite graph using a Bayesian latent class framework. Our results suggest that endogenous mobility biases estimated firm effects toward zero. To assess validity, we match our estimates of the wage components to out-of-sample estimates of revenue per worker. The corrected estimates attribute much more of the variation in revenue per worker to variation in match quality and worker quality than the uncorrected estimates.Item ROC-Based Model Estimation for Forecasting Large Changes in DemandSchneider, Matthew J.; Gorr, Wilpen L. (2013-10-14)Forecasting for large changes in demand should benefit from different estimation than that used for estimating mean behavior. We develop a multivariate forecasting model designed for detecting the largest changes across many time series. The model is fit based upon a penalty function that maximizes true positive rates along a relevant false positive rate range and can be used by managers wishing to take action on a small percentage of products likely to change the most in the next time period. We apply the model to a crime dataset and compare results to OLS as the basis for comparisons as well as models that are promising for exceptional demand forecasting such as quantile regression, synthetic data from a Bayesian model, and a power loss model. Using the Partial Area Under the Curve (PAUC) metric, our results show statistical significance, a 35 percent improvement over OLS, and at least a 20 percent improvement over competing methods. We suggest management with an increasing number of products to use our method for forecasting large changes in conjunction with typical magnitude-based methods for forecasting expected demand.Item Replicating the Synthetic LBD with German Establishment DataDrechsler, Jörg; Vilhuber, Lars (2013-04-15)One major criticism against the use of synthetic data has been that the efforts necessary to generate useful synthetic data are so intense that many statistical agencies cannot afford them. However, we argue in this paper that the field is still evolving and many lessons that have been learned in the early years of synthetic data generation can now be used in the development of new synthetic data products, considerably reducing the required investments. We evaluate whether synthetic data algorithms that have been developed in the U.S. to generate a synthetic version of the Longitudinal Business Database (LBD) can easily be transferred to generate a similar data product for other countries. We construct a German data product with information comparable to the LBD - the German Longitudinal Business Database (GLBD) - that is generated from different administrative sources at the Institute for Employment Research, Germany. In a second stage, the algorithms developed for the synthesis of the LBD will be applied to the GLBD. Extensive evaluations will illustrate whether the algorithms provide useful synthetic data without further adjustment. The ultimate goal of the project is to provide access to multiple synthetic datasets similar to the SynLBD at Cornell to enable comparative studies between countries. The Synthetic GLBD is a first step towards that goal.Item Presentation: Improved Research Access to Census Bureau Linked Administrative Data via Public-use ProductsAbowd, John; Vilhuber, Lars (2013-05-03)Professor John M. Abowd, Director of the Labor Dynamics Institute, presented on "Improved Research Access to Census Bureau Linked Administrative Data via Public-use Products" at the 18th Annual Meetings of the Society of Labor Economists in Boston, MA, on Friday, May 3, 2013.