To:	Arnie Reznek
From:	John M Abowd
re:	Disclosure Avoidance Review for Ian Schmutte
Date:	November 12, 2008
CC:	Jeremy Wu, Ian Schmutte

I have reviewed the files listed below. They conform to the DRB-approved Disclosure Avoidance Protocol 
for LEHD Program data (original dated July 1, 2003; revised August 14, 2007) on file in the office of 
the Assistant Division Chief for LEHD.

In my opinion, these files are safe to release to Ian Schmutte.

File location: //rdc10/rdcprojects/co00538/disclosure/schmutte-20081102/

Complete List of Files Approved for Release (DETAILS BELOW):
============================================================
/rdcprojects/co00538/disclosure/schmutte-20081102/schmutte-disclosure-avoidance-review-memo-20081112.txt
/rdcprojects/co00538/disclosure/schmutte-20081102/fuzz_moments_20081102.xls



Supporting Documentation: 
============================================================================================
The proposed release files are tables of first and second moments and summary statistics, 
including fuzzed counts, for the samples described below, all prepared using the approved
dynamic noise infusion fuzz-factors from the LEHD infrastructure file system as designed and
implemented for establishment-level statistics as in the Quarterly Workfoce Indicators. 
There are also percentiles (1, 5, 10, 15, ..., 95, 99) from a kernel density estimate of the
employer sizes (not fuzzed). 

Because the core variables use the QWI sampling frame and variable definitions, we have 
prepared the statistics based on these files using the approved noise-infusion method and 
the official noise factors. I supervised the preparation of the proposed release files to 
ensure that the correct QWI procedure was applied. We applied the SEIN-specific fuzz factor 
approved for use in public-release QWI tables to each variable prior to using it in
any statistical calculations. To ensure that counts were also properly fuzzed, the proposed 
release counts in the summary statistics are based on the same approved fuzz factors as the 
other statistics. Thus, every statistic requested proposed for release has been protected in 
the approved manner.

The main programs used to generate this output are included for the reviewer's reference 
in ./support

Note that the methodology and input data used to generate these data are identical to that 
used to prepare data that were released in two previous disclosure avoidance reviews.
See schmutte-disclosure-avoidance-review-memo-20080914.txt and
schmutte-disclosure-avoidance-review-memo- 20080127.txt.
The data being requested for release should be equally free of disclosure problems.

Description of Samples:
======================
All statistics are computed on a 28-state sample of LEHD data (CO and MT not included). 
The sampling frame is the set of annualized employment histories
reported in the Human Capital Estimates (vintage: vin_bls version 2.4 on
NSF01). 

This frame was drawn from the EHF sampling frame between 1990-2003,
annualized. The frame data were supplemented with additional individual
characteristics (sex, age, race, imputed education) from the ICF, SEIN
characteristics (NAICS 2002 Major Sector and estimated employment (recorded
from the ES-202 in the ECF)) of each SEIN from the ECF. The Human Capital
Estimates also provide data on imputed annual hours of work and the variance 
components of earnings due to unobservable person- and firm-specific 
heterogeneity.

From this frame, we selected for analysis all annualized work history
observations where age is between 18-70 and the imputed weekly hours of work 
is strictly greater than 34. 


Decription of File: ./schmutte-disclosure-avoidance-review-memo-20081112.txt
=======================================================================================
This memo.

Description of File: ./fuzz_moments_20081102.xls
=======================================================================================
These files contain estimated means, variances, and covariances between person-
and firm-specific wage components as recorded in the human capital
estimates. To prevent disclosure, we have prepared these tables using
the approved QWI method of noise-infusion. We apply the SEIN-specific fuzz
factor approved for use in public-release QWI tables to each variable prior to
including it in statistical calculations.

These empirical moments are computed within cells defined by the twenty major
NAICS sectors, coded here with indices "01" -- "20" They are furhter stratified within 
each sector across 10 size classes. The size classes are defined by deciles of 
sein-level employment across all annualized work-history observations. 

The table also includes an estimate of the sample covariance matrix for the empirical 
first and secont moments. For each size class, this matrix is computed using a bootstrap 
resampling of the fuzzed data. The fuzzed data are resampled 1000 times. The estimate 
of the covariance matrix is the sample covariance of the estimated first- and 
second-moments in the bootstrap sample.

Included variables are:

	nacis_sector
	COUNT - the fuzzed number of work-history observations contributing to the cell
	size_class - defined from deciles of firm-size distribution
	Etheta - the mean person-effect in the cell
	Epsi - the mean firm-effect in the cell
	vartheta - the variance of person-effects
	varpsi - the variance of firm effects
	covthetapsi - the covariance between theta and psi
	frac - The fraction of observations for the sector in the cell
	stdtheta - the standard deviation of the person-effects
	stdpsi - the standard deviation of the firm effects
	s_1_1--s_6_6 - these are 36 variables that hold the sample covariance matrix of
the empirical moments





