Your answers should be prepared in a text document using either LaTex, Word or a plain text editor (e.g. Kwrite on the Vrdc). If you cite any literature, make sure the citation is complete. Use the American Psychological Association standard for citing a web source. Submit your exam directly to John.Abowd@cornell.edu Any computer listings that are part of your answer should be submitted to the Lab 14 submissions folder on the VRDC. All questions count equally. The final grade will be based on a maximum score of 100 from this exam.
Answer 3 of the following questions. Your answer should not exceed 2 type-written double-spaced pages per question.
1. Explain the relationship among the Office of Mangement and Budget, the US Census Bureau, and the Internal Revenue Service in the US statistical system. Pay attention to where standards are set. In order for the Census Bureau and the IRS to exchange data what needs to occur. Reference the relevant statutes and describe the process.
2. For either the Decennial Census of Population or the quinquennial Economic Census answer the following questions:
a. What is the in-scope population?
b. Describe the frame development and maintenance procedures.
c. Describe the general methods for missing data edit and imputation.
d. List representative products from the Census Reseach Data Centers based on this census.
e. Describe the differences between the public-use products and the RDC-based data files for one of the products listed in part d.
3. Consider the following problem in record linking. You have a master list of business addresses that represent the physical locations for a population frame of establishments in the retail trade sector. You acquire a list of newly opened retail trade establishments. How would you use record linking to update your master list using the list of new establishment? Pay attention to how you would assess the parameters needed to do the matching. Include a discussion of your blocking and matching variable strategies. Your first step should be to unduplicate the list of new establishments. Since that is also a record linking application, describe the differences between unduplication applications and two-list applications. How would you assess the false match and false nonmatch rates?
4. You have developed a project proposal for the RDC that uses the Survey of Income and Program Participation linked to the lifetime Social Security taxable income (FICA-taxable) of the respondents. The purpose of your study is to improve the usefulness of the SIPP data in assessing retirement income. Write a Title 13 benefit statement for this project. Reference the correct parts of the Criterion document. Be sure to relate your proposed analysis directly to the benefits.
Answer 1 of the following questions.
5. Revisit lab 8 (Missing Data Analysis using Multiple Imputation). What would be an appropriate procedure for imputing missing household income (as compared to indivual wage and salary income)? How would you handle different sized households in the imputation? Construct a multiple imputer for household income and compare the result to the existing imputed values (allocated values). Why are they different?
6. Revisit lab 11 (Modeling Integrated Data). Compare the appropriate fixed effects and mixed effects models of the geographic heterogeniety in household income. Use household size as a fixed control variable. Explain the reasons for the differences in the results.