Presentations by Cornell University NCRN node
Permanent URI for this collection
This collection contains presentations made by Cornell NCRN node members at various conferences and meetings.
Browse
Recent Submissions
Item Reproducibility Confidentiality Data AccessVilhuber, Lars (2018-05-01)Talk on Replicability given by Lars Vilhuber.Item Large-scale Data Linkage from Multiple Sources: Methodology and Research ChallengesAbowd, John M. (2017-10-27)Presentation on methods for record linkageItem Confidentiality Protection and Physical Safeguards (LatAm version)Vilhuber, Lars (2017-06-07)Confidentiality protection is a multi-layered concept, involving statistical (cryptographic) methods and physical safeguards. When providing access to researchers (both internal to the agency and external academic), a tension arises between the level of trust vis-à-vis the researcher, the statistical disclosure limitation applied to the data visible to the researcher; and the physical access mechanisms used by the researcher. In this presentation, I (attempt to) review systems used by national and private research organizations around the world, putting them into the relevant legal and societal context.Item Two Perspectives on Commuting: A Comparison of Home to Work Flows Across Job-Linked Survey and Administrative FilesGreen, Andrew S.; Kutzbach, Mark J.; Vilhuber, Lars (2017-04)Commuting flows and workplace employment data have a wide constituency of users including urban and regional planners, social science and transportation researchers, and businesses. The U.S. Census Bureau releases two, national data products that give the magnitude and characteristics of home to work flows. The American Community Survey (ACS) tabulates households’ responses on employment, workplace, and commuting behavior. The Longitudinal Employer-Household Dynamics (LEHD) program tabulates administrative records on jobs in the LEHD Origin-Destination Employment Statistics (LODES). Design differences across the datasets lead to divergence in a comparable statistic: county-to-county aggregate commute flows. To understand differences in the public use data, this study compares ACS and LEHD source files, using identifying information and probabilistic matching to join person and job records. In our assessment, we compare commuting statistics for job frames linked on person, employment status, employer, and workplace and we identify person and job characteristics as well as design features of the data frames that explain aggregate differences. We find a lower rate of within-county commuting and farther commutes in LODES. We attribute these greater distances to differences in workplace reporting and to uncertainty of establishment assignments in LEHD for workers at multi-unit employers. Minor contributing factors include differences in residence location and ACS workplace edits. The results of this analysis and the data infrastructure developed will support further work to understand and enhance commuting statistics in both datasets.Item Excerpt: Usage and outcomes of the Synthetic Data ServerVilhuber, Lars; Abowd, John M. (2017-05-09)This is an excerpt from a prior presentation at the Society of Labor Economists (2016). The Synthetic Data Server (SDS) at Cornell University was set up to provide early access to new synthetic data products by the U.S. Census Bureau. These datasets are made available to interested researchers in a controlled environment, prior to a more generalized release. Over the past 5 years, 4 synthetic datasets were made available on the server, and over 100 users have accessed the server over that time period. This paper reports on interim outcomes of the activity: results of validation requests from a user perspective, functioning of the feedback loop due to validation and user input, and the role of the SDS as an access gateway to and educational tool for other mechanisms of accessing detailed person, household, establishment, and firm statistics.Item Confidentiality of the SynLBDVilhuber, Lars; Kinney, Saki (2017-05-09)We describe the confidentiality protection provided by the SynLBD. The presentation was originally prepared by Saki Kinney for the World Statistics Congress 2013.Item SynLBD Inputs: Structure, ExampleVilhuber, Lars; Drechsler, Jörg (2017-05-09)We describe the structure of inputs for the SynLBD, and discuss challenges in preparing them.Item Overview: Synthetic Longitudinal Business Data International User SeminarVilhuber, Lars; Kinney, Saki (2017-05-09)An overview over the content of the Synthetic Longitudinal Business Data International User Seminar, based in part on a presentation prepared by Saki Kinney for the 2013 World Statistics Congress (WSC2013).Item Synthetic Longitudinal Business Data International User SeminarVilhuber, Lars (2017-05-09)Item Confidentiality Protection and Physical SafeguardsVilhuber, Lars (2017-02-09)Confidentiality protection is a multi-layered concept, involving statistical (cryptographic) methods and physical safeguards. When providing access to researchers (both internal to the agency and external academic), a tension arises between the level of trust vis-à-vis the researcher, the statistical disclosure limitation applied to the data visible to the researcher; and the physical access mechanisms used by the researcher. In this presentation, I (attempt to) review systems used by national and private research organizations around the world, putting them into the relevant legal and societal context.