Course overview
- Instructors:
-
- Prof. John M. Abowd, Cornell University (john.abowd@cornell.edu)
- Lars Vilhuber, Cornell University (lars.vilhuber@cornell.edu)
- Date:
- Spring 2011, Tuesdays 4:00 - 6:30pm (EST) (first class January 18, 2011) (Google calendar link here, reflecting any changes). Participants in other time zones need to adjust the times accordingly.
Site-specific class information:
-
Participating institutions
Course numbers
Locations
Cornell University/ILR School
INFO7470/ILRLE7400
Ives 109
Georgia State University (contact Prof. Barry Hirsch)
ECON 9520
AYS Seminar Room 750
University of Georgia (contact Prof. Ian Schmutte)
ECON 8850
Conner 307
Clark University (contact Prof. Wayne Gray)
Econ 399
Jonas Clark Hall room 117
Joint Program on Survey Methodology (JPSM), University of Maryland
SURV699T
LeFrak Hall 2208
University of Minnesota, Minnesota Population Center (contact Prof. J Michael Oakes)
PUBH 7391
Peik Hall 165
University of California, Berkeley (contact Prof. Trond Petersen)
SOC 292
Anna Head complex, rm C108
University of California, Los Angeles (contact Prof. David Rigby)
(none)
Rm 4240 Public Affairs Bldg
- Sponsor:
- Previous versions of this course were sponsored by the National Science Foundation Information Technologies Research Program under grant SES #0427889 .
The course is designed to teach students the basics required to acquire and transform raw information into social and economic data. The current version is particularly aimed at American Ph.D. students who are interested in using confidential U.S. Census Bureau data, and the confidential data of other American statistical agencies that cooperate with the Census Bureau. The legal, statistical, computing, and social science aspects of the data "production" process will be treated. Major emphasis will be placed on U.S. Census data that are accessible from the Census Bureau's Research Data Center network. Graduate students and faculty who are planning to use RDC-based data, or are seriously considering it, should do the RDC-project option for the final exam. RDC-based data products covered include the internal files used to manage the Census Bureau's household and establishment frames; the Longitudinal Employer-Household Dynamics (LEHD) micro data; the Longitudinal Business Database (LBD) and its predecessor the Longitudinal Research Database (LRD); internal versions of the Survey of Income and Program Participation (SIPP), Current Population Survey (CPS), American Community Survey (ACS), American Housing Survey (AHS), and the 1990, 2000, and 2010 Decennial Censuses of Population and Housing; the Employer and Non-employer Business Registers (BR and SSEL); the Censuses and Annual Surveys of Manufactures, Mining, Services, Retail Trade, Wholesale Trade, Construction, Transportation, Communications, and Utilities; Business Expenditures Survey; Characteristics of Business Owners; and others. Students will also be introduced to the NSF-sponsored Virtual Research Data Center and Social Science Gateway to Teragrid.
Core topics include:
- Basic statistical principles of populations and sampling frames
- Acquiring data via samples, censuses, administrative records, and transaction logging
- Law, economics and statistics of data privacy and confidentiality protection
- Data linking and integration techniques (probabilistic record linking; multivariate statistical matching)
- Data imputation techniques
- Analytic methods for complex linked data sets