Course overview

Instructors:
Date:
Spring 2011, Tuesdays 4:00 - 6:30pm (EST) (first class January 18, 2011) (Google calendar link here, reflecting any changes). Participants in other time zones need to adjust the times accordingly.

Site-specific class information:

Participating institutions

Course numbers

Locations

Cornell University/ILR School

INFO7470/ILRLE7400

Ives 109

Georgia State University (contact Prof. Barry Hirsch)

ECON 9520

AYS Seminar Room 750

University of Georgia (contact Prof. Ian Schmutte)

ECON 8850

Conner 307

Clark University (contact Prof. Wayne Gray)

Econ 399

Jonas Clark Hall room 117

Joint Program on Survey Methodology (JPSM), University of Maryland

SURV699T

LeFrak Hall 2208

University of Minnesota, Minnesota Population Center (contact Prof. J Michael Oakes)

PUBH 7391

Peik Hall 165

University of California, Berkeley (contact Prof. Trond Petersen)

SOC 292

Anna Head complex, rm C108

University of California, Los Angeles (contact Prof. David Rigby)

(none)

Rm 4240 Public Affairs Bldg

Sponsor:
Previous versions of this course were sponsored by the National Science Foundation Information Technologies Research Program under grant SES #0427889 .

The course is designed to teach students the basics required to acquire and transform raw information into social and economic data. The current version is particularly aimed at American Ph.D. students who are interested in using confidential U.S. Census Bureau data, and the confidential data of other American statistical agencies that cooperate with the Census Bureau. The legal, statistical, computing, and social science aspects of the data "production" process will be treated. Major emphasis will be placed on U.S. Census data that are accessible from the Census Bureau's Research Data Center network. Graduate students and faculty who are planning to use RDC-based data, or are seriously considering it, should do the RDC-project option for the final exam. RDC-based data products covered include the internal files used to manage the Census Bureau's household and establishment frames; the Longitudinal Employer-Household Dynamics (LEHD) micro data; the Longitudinal Business Database (LBD) and its predecessor the Longitudinal Research Database (LRD); internal versions of the Survey of Income and Program Participation (SIPP), Current Population Survey (CPS), American Community Survey (ACS), American Housing Survey (AHS), and the 1990, 2000, and 2010 Decennial Censuses of Population and Housing; the Employer and Non-employer Business Registers (BR and SSEL); the Censuses and Annual Surveys of Manufactures, Mining, Services, Retail Trade, Wholesale Trade, Construction, Transportation, Communications, and Utilities; Business Expenditures Survey; Characteristics of Business Owners; and others. Students will also be introduced to the NSF-sponsored Virtual Research Data Center and Social Science Gateway to Teragrid.


Core topics include:

  • Basic statistical principles of populations and sampling frames
  • Acquiring data via samples, censuses, administrative records, and transaction logging
  • Law, economics and statistics of data privacy and confidentiality protection
  • Data linking and integration techniques (probabilistic record linking; multivariate statistical matching)
  • Data imputation techniques
  • Analytic methods for complex linked data sets