readme w e winkler 000112 3p
(modified Abowd and Vilhuber, 2005-03-29, 2011-03-04)


This example contains executables (programs) that can be run on a variety of Linux and Windows systems. The original version was for DOS/Windows, and is still available. Differences between the versions are pointed out below. If you wish to run on other computers (f.i. Mac OS X), you will need to recompile the programs (source code is available in matcher-source.tgz, matcher-aux-source.tgz). Documentation for the Census software is available here

To use this example, you will need to know how to copy and rename files, to use a text editor, and to run programs in the Linux or MS-DOS shell under Windows.

See the Tips and Tricks page for help on working on Linux systems.


A matching example

(more details of the programs are given at the end of this file.)

Input data

Two inputs sorted by cluster & first character of surname: Inputs must be sorted by appropriate keys for first matching pass


Running the code

First pass

  1. Create frequencies (counts of combinations of values of the blocking variables)
    1. copy parmn1.txt to parmn.txt
    2. run cb1 to produce counter file cntcb.dat.
  2. Create probabilities
    1. rename/copy/move cntcb.dat to cntcb3.dat.
    2. replace lines 2-11 of cntcb3.dat with the 10 lines in c_parm.txt
    3. run eci to produce file of probabilities initi.dat
  3. Run first match pass
    1. rename/copy/move initi.dat as init.dat.
    2. rename/copy/move parmmf1.txt as parmmf.txt
    3. run matcher to get matching outputs and pointer file
  4. Create formatted output file
    1. run runprt to get printout print.dat
    2. rename print.dat as pr1.dat
  5. determine high and low cutoffs
    1. Inspect at pr1.dat
    2. You should find that (8.0, 1.19) are good values
    3. enter the two numbers as the first line of cutoff.dat
  6. Identify links, non-links, and clericals
    1. run resid2 to get r_sorta.dat & r_sortb.dat
    2. (If you prefer to use SAS instead of resid2.exe: run in SAS to get pub1ab.dat & pub1bb.dat)
  7. Save the parameter files
    1. rename/move/copy pointer file pntmf.dat as pntmf1.dat
    2. rename/move/copy parmn.txt as parmn1.txt
    3. rename/move/copy parmmf.txt as parmmf1.txt

Second pass

  1. Prepare the data files:
  2. Prepare parameter files
    1. rename/move/copy parmmf2.txt as parmmf.txt
    2. rename/move/copy parmn2.txt as parmn.txt
  3. Run second match pass
    1. run matcher to get matching outputs and pointer file
  4. Create formatted output file
    1. run runprt to get printout print.dat
    2. rename/move/copy print.dat as pr2.dat
  5. Save the parameter files
    1. rename/move/copy pointer file pntmf.dat as pntmf2.dat
    2. rename/move/copy parmn.txt as parmn2.txt
    3. rename/move/copy parmmf.txt as parmmf2.txt

Submitting files

  1. Look at pr2.dat to determine whether any additional matches are found. If new matches are found, why were they not found on the first pass? Answer this question in a separate document, and submit as "lab10_match.(suffix)" in the CMS system.
  2. Submit pr2.dat as-is in the CMS system.

More details on programs

Look at read.cnt, read.em, & The following programs and scripts are provided:
Program Function Platform notes
cb1[.exe] Create counts C source code available, all platforms
eci[.exe] EM algorithm Fortran source available, all platforms
matcher[.exe] SRD Matcher C source code available, all platforms
prt1[.exe] Create output from matcher C source code available, all platforms
resid2[.exe] Extracts records from match output and input data C++ source code available, all platform
runprt[.bat] Combines sort and prt1 to produce output Script file
sort.exe GNU sort Provided as binary only for DOS platform. Compiled from GNU sources


The program resid2[.exe] uses inputs parmf.txt, pntmf.dat, parmn.txt, pub1a.dat, & pub1b.dat to produce outputs r_sorta.dat, & r_sortb.dat where the names of the outputs are obtained from the names in parmn.txt

The SAS program performs the same functions, but outputs to pub1ab.dat and pub1bb.dat, which are already sorted.


This program is only provided for DOS and Windows. On Linux, use the native 'sort' command; see 'man sort' for options. They are both derived from the same GNU sources, although the DOS version provided here is much older.

The GNU sort program sort.exe works like the unix sort. Only a few commands are described below.
 `sort' [-cmuV] [-t c] [-o `file'] [-T `dir'] 
[-bdfiMnr] [+n [-m] ...] [`file' ...]

Send output to `file' (overwriting).

Sort in reverse.

How to specify the sort keys

Keys are zero based, thus the first field has number 0, and so on.

Start a new key at character num2 of field num1.

Extend the key upto (not including) character num2 of field num1.

Valid HTML 4.01!