CPS Public Use Microdata
The Census Bureau and BLS publish microdata files from the CPS. The data are available for download and online access, through portals such as MDAT.
Contents
Variable Names
Variable names follow a scheme:
The first letter indicates an analysis level.
Letter |
Meaning |
P |
Individual |
H |
Household |
G |
Geographical unit |
The second letter indicates a source.
Letter |
Meaning |
U |
Original, unedited |
E |
Edited |
R |
Recoded |
T |
Topcoded (i.e., any value over a threshold is recoded into the threshold) |
||W||Weighting (see Weights section)|
|X |
Allocation (see Allocation Flags section) |
Missing Values
The following values have indicate a type of missing value:
Value |
Meaning |
-1 |
Blank (often out of universe) |
-2 |
Don't know |
-3 |
Refused |
Weights
Weights are calculated for and assigned to each record to produce more accurate estimates. There are several weights provided by the Census Bureau, and a use case should be matched to the most appropriate ones.
Weight |
Variable Name |
Usage |
Family weight |
PWFMWGT |
Used for estimates of families |
Longitudinal weight |
PWLGWGT |
Used for records that are matched month-to-month |
Outgoing rotation weight |
PWORWGT |
Used for estimates making use of only outgoing rotation groups (i.e., months 4 or 8) |
Second stage weight |
PWSSWGT |
Used for calibration |
Veterans weight |
PWVETWGT |
Used for estimates of veterans and nonveterans |
Composited weight |
PWCMPWGT |
Used for estimates of individuals |
Household weight |
HWHHWGT |
Used for estimates of households |
Weight variables are stored with 4 implied decimal places; they should be divided by 10,000 before using.
For most estimations, the final composited weights are recommended.
When pulling CPS targets for use in weighting, the second stage weights should be used.
Allocation Flags
For some interviews, it is necessary to impute values. These edits are evident in changes from the unedited variable ("U") to the edited variable ("E"). An allocation flag variable ("X") is provided to make the imputation more evident.
Most individual- ("P") and household-level ("H") variables have a corresponding allocation flag. Some recoded ("R") and topcoded ("T") variables do as well.
All allocation variables follow the same scheme:
The first digit indicates the how.
Digit |
Meaning |
0 |
No change |
1 |
Changed to some value |
2 |
Changed to an unedited value from a prior interview |
3 |
Changed to an edited value from a prior interview |
4 |
Changed to an allocated value |
5 |
Changed to be blank |
The second digit indicates the why, and largely corresponds to the missing values detailed above.
Digit |
Meaning |
0 |
Unedited variable was set to a value |
1 |
Unedited variable was blank |
2 |
Unedited variable indicated "don't know" |
3 |
Unedited variable indicated refusal |
As an example, if PXSEX=21, then PESEX was blank and imputed from a prior interview.
Edited Universe
Some questions are intended for a subpopulation of respondents. Missingness (i.e., values of -1) is then enforced through allocation recodes. For example, PEEDUCA has an edited universe of PRPERTYP = 2 or 3; it will be set to -1 for all cases where PRPERTYP = 1.
The edited universe for any variable is noted in the data dictionary.
Data
The most commonly used variables, and recommended uses of them, are:
Variable |
Usage |
PRPERTYP |
Identifies record as child, adult, or member of US Armed Forces |
PRTAGE |
Age; topcoded at 85 |
PESEX |
Sex |
PTDTRACE |
Race |
PEHSPNON |
Hispanic ethnicity |
PEEDUCA |
Educational attainment; edited universe is PRPERTYP=2 or 3 |
PEAFEVER |
U.S. Armed Forces service history; edited universe is PRTAGE>=17 |
PEMARITL |
Marital status; edited universe is PRTAGE>=15 |
PENATVTY |
Foreign-born status |
PEMLR |
Employment status; edited universe is PRPERTYP=2 |
PRWRKSTAT |
Work status; edited universe is PEMLR=1 thru 7 |
PEHRUSL1 |
Usual hours worked per week at first job;edited universe is PEMLR=1 or 2 |
PEHRUSL1 |
Usual hours worked per week at second job (if applicable) |
PEHRUSLT |
PEHRUSL1 + PEHRUSL2 |
PEHRWANT |
Seeking full-time work; edited universe is PEMLR=1 and PEHRUSLT=0 thru 34 |
HEFAMINC |
Household annual income (16 categories); note the high allocation rate |
GESTFIPS |
State |
GTCBSA |
Metropolitan statistical area |
GTCO |
County |
These measures are not collected from the entire sample.
Variable |
Usage |
PEERNHRO |
Usual hours worked per week if paid an hourly wage |
PTERNHLY |
Hourly wage rate; 2 implied decimal places; topcoded based on the product of this and PEERNHRO |
PTERNWA |
PEERNHRO * PTERNHLY; 2 implied decimal places |
PRDTOCC1 |
Occupation (23 categories) for first job; recode from PTIO1OCD |
PRDTOCC2 |
Occupation for second job; recode from PTIO2OCD |
PRDTIND1 |
Industry (52 categories) for first job; recode from PEIO1ICD |
PRDTIND2 |
Industry for second job; recode from PEIO2ICD |
These categories are either standard definitions or the prevailing usage of the term:
Category |
Meaning |
Civilian noninstitutional population |
PRPERTYP=2 and PRTAGE>=16 |
Unemployed |
PEMLR=3 or 4 |
Employed |
PEMLR=1 or 2 |
Labor force |
PEMLR=1 thru 4 |
Working part time for economic reasons |
PRWRKSTAT=3 or 6 |
White, Non-Hispanic |
PTDTRACE=1 and PEHSPNON=2 |
Black, Non-Hispanic |
PTDTRACE=2 and PEHSPNON=2 |
AAPI, Non-Hispanic |
PTDTRACE=4 or 5 or 15; and PEHSPNON=2 |
Other, Non-Hispanic |
PTDTRACE=3 or 6thru 14 or 16thru 26; and PEHSPNON=2 |
Hispanic |
PEHSPNON=1 |
See also
Census Bureau's project homepage