Lab 9:  Synthetic Data Analysis Using Multiple Imputation

Data are on the Virtual RDC at /space/courses/info747/lab8.

From the 2000 PUMS for Alaska (a small state to save time and space), using the sample data creation programs (01.pums2000.sas) and the data selector example (02.pums2000-missing.sas)

1. Using the model you built to solve lab 8, synthesize wage and salary income and recoded education for all observations. Make 5 synthetic data sets A synthetic data set consists of all of the data from the input data set except for wage and salary income and recoded education. For the latter two variables every value is replaced by a draw from the posterior predictive distribution. Use the same one that you built for lab 8.

2. Compare the completed data that you built in lab 8 with the synthetic data that you just built. The correct combining formula for these synthetic data is T=U + B/5 where T is the total variance, U is the average variance within synthetic implicate, and B is the between synthetic implicate variance. Recall that the correct formula for the completed data files from multiply imputation of only missing data is T=U + (1+1/5)*B.