Lab 8: Data Imputation Methods
- The object of this lab is to create and use a probabilistic crosswalk
- The lab and all of its files are also accessible on the SSG at
/ssgprojects/courses/info7470
. SAS or Stata is required to complete the lab, both of which are accessible on the SSG.- Data:
- Input data: naicsmiss.sas7bdat and naicsmiss.dta.
- The variables in
naicsmiss
are
sic
(sometimes incomplete; i.e., expressed to only 2 or 3 digits)naics
(always missing).- Cross walk: sic_naics.sas7bdat and/or sic_naics.dta.
- The variables in
sic_naics.sas7bdat
are(Note: incomplete employment data in the cross walk is indicated by a value of 1 for
es_sic
= the 4-digit 1987 SIC code;naics_impute
= the 6-digit 2002 NAICS code;emp
= employment in the indicated (SIC, NAICS) pair;sum_sic
= employment in the indicated SIC;pct_emp = emp/sum_sic
;low_limit
= lower limit for random comparison topct_emp
in imputation;up_limit
= upper limit for random comparison topct_emp
in imputation.sum_sic
and fractions foremp
. Do not worry about this.)- The exercise: Write a SAS (Stata) program to do a single probabilistic imputation of
naics
from the data insic_naics
. This is a straightforward application of the information in thesic_naics
cross walk. Be careful how you handle the incomplete SIC codes. For these cases you will have to build the correct conditional probability model for the imputation. Provide the program code, documenting at each step what you are doing and why (use SAS/Stata comments). The program code, when executed, should run without errors.- If you run your program a second time, you should not get the same answer. Explain why not. Your answer should be uploaded as a separate text document.
Submitting labs
Maximum group size: for programs, up to 3 students, if so declared. Otherwise: individual submissions only. Each student should still submit all required elements individually. Submissions are made on the Course Management Site. The documents can be submitted here. Due date: April 5, 2011, 3:59PM (Note: the site is used only for submissions, not for the other functionality you will find there.)