Differences between revisions 11 and 12
Revision 11 as of 2025-09-05 19:47:11
Size: 4388
Comment: Notes
Revision 12 as of 2025-09-08 22:46:41
Size: 8076
Comment: Notes
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:

Technically, the [[UnitedStates/PuertoRico|Puerto Rican]] component of the survey is the '''Puerto Rico Community Survey''' ('''PRCS''').
Line 25: Line 27:
The CPS utilizes a multi-level probability sample of about 72,000 housing units. It is designed to reflect the civilian noninstitutionalized U.S. population aged 16 or older. There are two samples used in this survey. The first is the '''housing unit''' ('''HU''') sample. This is drawn from the Census Bureau's Master Address File (MAF). The annual target sample size is about 3.5 million.
Line 27: Line 29:
Each state plus [[UnitedStates/WashingtonDC|D.C.]] is an independent sample. NY is subdivided into [[UnitedStates/NewYorkCity|NYC]] and 'balance of NY', and CA is subdivided into [[UnitedStates/LosAngeles|LA county]] and 'balance of CA'; each substate area is an independent sample. No HU is sampled more than once in 5 years. To accommodate this, the list is partitioned into 5 subframes that are assigned to discrete sample years. New addresses are randomly selected into a subframe.

Each annual survey administration is divided into 6-month periods. Period 1 covers the months of January through June and is selected in September, while Period 2 covers July though December and is selected in March. For a Period 1 sample, the appropriate sample year subframe is randomly assigned to either of the periods. For a Period 2 sample, after incorporating the assignments from the Period 1 sample, any new addresses are randomly assigned to either of the periods. This process constitutes the first-stage sampling.

For the second-stage, the appropriate periodic subframe is used. The target sample size is roughly half the annual target sample size. The HU sample is selected from every county independently. Here 'county' is used to mean actual counties, county equivalents, municipalities in [[UnitedStates/PuertoRico|PR]], and [[UnitedStates/WashingtonDC|D.C.]] itself.

A follow-up sample is selected from the set of nonresponding addresses. In some blocks, this sampling rate is 100%.

The second sample is the '''group quarters''' ('''GQ''') sample. This is drawn from a list of known multi-resident addresses that is labeled by type. Immediately it is partitioned between small GQ facilities (i.e., 15 or fewer residents) and large GQ facilities (i.e., more than 15). Addresses with an unknown count are treated as small.

The small GQ sample is very similar to the HU sample. The first-stage of sampling is partitioning the list into five sample year subframes. The second-stage selects addresses from every state (plus DC and PR) independently. The number of residents is not taken into account, given there is little variation.

At the time of the interview, the roster of actual residents is identified. If there are more than 15 actual residents, a random sample of 10 is selected.

The list of large GQ facilities is ''not'' partitioned into sample years. Residents in such facilities are treated as groups of 10. Addresses then have a calculated '''GQ measure of size''' ('''GQMOS'''), which is the number of residents divided by 10. Groups are selected from every state (plus DC and PR) independently. If a state's sampling rate is 2.5%, a facility with a GQMOS of 40 (i.e., roughly 400 residents) will have at least one group selected with certainty. This process constitutes the first-stage sampling.

At the time of the interview, the selection of 10 actual residents constitutes the second-stage sampling. The roster of actual residents is identified and a random sample of 10 is selected. If there are fewer than 10 actual residents, all are selected with certainty. This process constitutes the second-stage sampling.

If a large GQ facility has multiple groups selected, they are spread across sample months. In some cases it is still necessary to administer the survey to multiple groups in the same sample month, in which case the second-stage sample is larger than 10.

Note also that this second-stage sample is drawn independent of any other sample months. Residents of a large GQ facility that is included in multiple sample months can be selected repeatedly.

Also note that the process for small GQ facilities that are identified at the time of interview as having more than 15 actual residents mirrors the large GQ sample process where one group was selected.

Lastly, Remote Alaska is handled separately. HU sample months are selected with respect to season and geography, to balance workload. The GQ sample is assigned to either January or July, and both are administered over 6 months.



=== Mode ===

Survey invitations are sent by mail to selected HU addresses, and primary residents are encouraged to respond by either mail-back paper survey or to follow a link for a web survey. The survey, like the long-form decennial census that it replaced, is mandatory.

The follow-up effort uses CAPI.

Interviewers go to selected GQ addresses in-person to identify the roster of actual residents and administer the second-stage sampling.

Lastly, Remote Alaska is handled separately. All addresses are surveyed by CAPI.
Line 33: Line 72:
The survey uses an annual sample, which is counted as beginning in April. No household is sampled more than once in 5 years.

Each month the survey is administered to panels ('rotation groups'). Each panel is a replicate.

Panel members are interviewed for 4 consecutive months, then dropped from the sample for 8 months, then interviewed for 4 more consecutive months. By this design, in any given month, there are 8 panels in sample and 1/8 of the sample is interviewed for the first time. There is 75% overlap month-to-month and a 50% overlap year-to-year.

Data is collected into annual vintages; '1-year estimates' are usually published in the Fall.
Responses are collected into annual vintages; '1-year estimates' are usually published in the Fall.
Line 47: Line 80:
Units carry a base weight equal to the inverse selection probability. HU sample members carry a base weight equal to the inverse selection probability. Those selected for the follow-up sample have their base weight adjusted to account for this selection probability as well. There is also a correction adjustment for those selected for the follow-up sample but actually did respond (late) to the original survey invite. This correction is applied to every sample member selected for the follow-up sample, and simply un-does the prior adjustment's transfer of weight.
Line 51: Line 84:
A first-stage adjustment factor is calculated to calibrate the 'Black alone'/'non-Black alone' population distribution. Within each state, four adjustment cells are calculated with respect to race (Black alone and non-Black alone) and age (0-15 and 16+). Population controls are taken from the census. The race cells are collapsed if there are fewer than 4 sampled PSUs in the state, or if there are fewer than 10 respondents per month, or if the adjustment factor is outside the range of 0.5 to 1.5. This is done separately for each panel. Finally, the HU sample is post-stratified to population controls by race/ethnicity, sex, and age group (13 levels). These controls are modeled at the sub-county-level. These are the HU weights.
Line 53: Line 86:
Panels are then paired according to the number of months spent in sample. This helps to correct for 'months-in-sample' (MIS) bias. MIS 1 is paired with MIS 5, MIS 2 is paired with MIS 6, and so on. These pairs are combined for all subsequent adjustments. Person-level HU weights are also produced that take into account the actual demographic characteristics of HU residents. The HU weights are used as the base.
Line 55: Line 88:
The weights are then calibrated to national population controls by age, sex, race, and ethnicity. Sparse cells are collapsed if there are fewer than 20 respondents per month or if the adjustment factor is outside the range of 0.6 to 2.0. The GQ data set is prepared specially to account for geographies with zero selected facilities. All large out-of-sample GQ facilities receive imputed whole person records. A random sample of small out-of-sample GQ facilities are selected to similarly receive imputed whole person records.
Line 57: Line 90:
The weights are then calibrated to state population controls by race, sex, and age. Sparse cells are collapsed if there are fewer than 20 respondents per month or if the adjustment factor is outside the range of 0.6 to 2.0. GQ sample members carry a base weight equal to the inverse selection probability. These are then adjusted with tract-, and county-, and state-constraints, such that the sum of weights match population counts at those levels. The reason that weights did not already sum to those counts is that small GQs are not selected with respect to the number of residents.
Line 59: Line 92:
A second-stage adjustment factor is a raking procedure over dimensions defined by:
 * state by sex by age
 * ethnicity by sex by age
 * race by sex by age
10 iterations are used to converge onto the population controls.
Finally, the GQ sample is post-stratified to population controls. These are the GQ person weights.

American Community Survey

The American Community Survey (ACS) is a continual survey operated by the Census Bureau.

Technically, the Puerto Rican component of the survey is the Puerto Rico Community Survey (PRCS).


Usage

See here for notes on the public use microdata.


Design

Sampling

There are two samples used in this survey. The first is the housing unit (HU) sample. This is drawn from the Census Bureau's Master Address File (MAF). The annual target sample size is about 3.5 million.

No HU is sampled more than once in 5 years. To accommodate this, the list is partitioned into 5 subframes that are assigned to discrete sample years. New addresses are randomly selected into a subframe.

Each annual survey administration is divided into 6-month periods. Period 1 covers the months of January through June and is selected in September, while Period 2 covers July though December and is selected in March. For a Period 1 sample, the appropriate sample year subframe is randomly assigned to either of the periods. For a Period 2 sample, after incorporating the assignments from the Period 1 sample, any new addresses are randomly assigned to either of the periods. This process constitutes the first-stage sampling.

For the second-stage, the appropriate periodic subframe is used. The target sample size is roughly half the annual target sample size. The HU sample is selected from every county independently. Here 'county' is used to mean actual counties, county equivalents, municipalities in PR, and D.C. itself.

A follow-up sample is selected from the set of nonresponding addresses. In some blocks, this sampling rate is 100%.

The second sample is the group quarters (GQ) sample. This is drawn from a list of known multi-resident addresses that is labeled by type. Immediately it is partitioned between small GQ facilities (i.e., 15 or fewer residents) and large GQ facilities (i.e., more than 15). Addresses with an unknown count are treated as small.

The small GQ sample is very similar to the HU sample. The first-stage of sampling is partitioning the list into five sample year subframes. The second-stage selects addresses from every state (plus DC and PR) independently. The number of residents is not taken into account, given there is little variation.

At the time of the interview, the roster of actual residents is identified. If there are more than 15 actual residents, a random sample of 10 is selected.

The list of large GQ facilities is not partitioned into sample years. Residents in such facilities are treated as groups of 10. Addresses then have a calculated GQ measure of size (GQMOS), which is the number of residents divided by 10. Groups are selected from every state (plus DC and PR) independently. If a state's sampling rate is 2.5%, a facility with a GQMOS of 40 (i.e., roughly 400 residents) will have at least one group selected with certainty. This process constitutes the first-stage sampling.

At the time of the interview, the selection of 10 actual residents constitutes the second-stage sampling. The roster of actual residents is identified and a random sample of 10 is selected. If there are fewer than 10 actual residents, all are selected with certainty. This process constitutes the second-stage sampling.

If a large GQ facility has multiple groups selected, they are spread across sample months. In some cases it is still necessary to administer the survey to multiple groups in the same sample month, in which case the second-stage sample is larger than 10.

Note also that this second-stage sample is drawn independent of any other sample months. Residents of a large GQ facility that is included in multiple sample months can be selected repeatedly.

Also note that the process for small GQ facilities that are identified at the time of interview as having more than 15 actual residents mirrors the large GQ sample process where one group was selected.

Lastly, Remote Alaska is handled separately. HU sample months are selected with respect to season and geography, to balance workload. The GQ sample is assigned to either January or July, and both are administered over 6 months.

Mode

Survey invitations are sent by mail to selected HU addresses, and primary residents are encouraged to respond by either mail-back paper survey or to follow a link for a web survey. The survey, like the long-form decennial census that it replaced, is mandatory.

The follow-up effort uses CAPI.

Interviewers go to selected GQ addresses in-person to identify the roster of actual residents and administer the second-stage sampling.

Lastly, Remote Alaska is handled separately. All addresses are surveyed by CAPI.

Frequency

Responses are collected into annual vintages; '1-year estimates' are usually published in the Fall.

Data is further aggregated into a rolling 5-year window for '5-year estimates'.

Weighting

HU sample members carry a base weight equal to the inverse selection probability. Those selected for the follow-up sample have their base weight adjusted to account for this selection probability as well. There is also a correction adjustment for those selected for the follow-up sample but actually did respond (late) to the original survey invite. This correction is applied to every sample member selected for the follow-up sample, and simply un-does the prior adjustment's transfer of weight.

Nonresponse adjustments are calculated with respect to eligible households. That is, vacant or demolished structures that were sampled are not used for this adjustment.

Finally, the HU sample is post-stratified to population controls by race/ethnicity, sex, and age group (13 levels). These controls are modeled at the sub-county-level. These are the HU weights.

Person-level HU weights are also produced that take into account the actual demographic characteristics of HU residents. The HU weights are used as the base.

The GQ data set is prepared specially to account for geographies with zero selected facilities. All large out-of-sample GQ facilities receive imputed whole person records. A random sample of small out-of-sample GQ facilities are selected to similarly receive imputed whole person records.

GQ sample members carry a base weight equal to the inverse selection probability. These are then adjusted with tract-, and county-, and state-constraints, such that the sum of weights match population counts at those levels. The reason that weights did not already sum to those counts is that small GQs are not selected with respect to the number of residents.

Finally, the GQ sample is post-stratified to population controls. These are the GQ person weights.

Geographies

1-year estimates are published for geographic areas with populations of 65,000 or more. This threshold is set so that estimates can be made available for all states, territories, congressional districts, PUMAs, CBSAs, cities, and Native American areas.

5-year estimates are published for geographic areas with much smaller population levels. This includes ZCTAs, census tracts, and census block groups.


History

The ACS was launched in 2005 as a replacement for the long-form U.S. census. It provides more timely data because data collection is continuous, and then published in a periodic aggregation. It is used to allocate federal and state funding.


CategoryRicottone

UnitedStates/CensusBureau/AmericanCommunitySurvey (last edited 2025-09-08 22:46:41 by DominicRicottone)