Survey Sampling

Survey sampling is a procedure of selecting prospective respondents for a survey experiment.


Frames

Survey sampling begins with identification of the frame. A properly-specified frame covers the complete true population. An improperly-specified frame introduces sampling error.

Non-probability Panel

A non-probability panel is a stream of potential respondents that is different from the true population, and for which there is no known probability of selection. This type of panel poses important challenges to the model of survey inference, but it is also much cheaper to collect data in this manner.

The probability of selection is not known so the base survey weight is simply 1.


Census

A census is a survey of the complete true population. The probability of selection is 1, so the base survey weight is also 1.


Address Listing


Random Digit Dialing

A random digit dialing (RDD) sample fundamentally has two steps.

First, all possible hundred-digit blocks form a frame. These are the full set of 8-digit prefixes for which there are 100 10-digit phone numbers. The best practice, when it is feasible, is to remove the hundred-digit blocks with 0 or 1 residential numbers (according to either published phone books or collected data). A SRS with replacement is then drawn from this frame. This forms the first important probability of selection.

Stratified RDD sampling relies on the segmentation of the hundred-digit block frame. Generally, this is done according to geographic bounds (state, county, or ZIP code) for which 3-digit area codes can be roughly matched. Given this information and additional statistics, e.g. which ZIP codes have higher concentrations of black residents, it can be possible to segment the frame by demographic propensities.

For each selection (because a hundred-digit block can be selected multiple times), a random 10-digit phone number from the hundred-digit block is called. In some cases, 10-digit phone numbers are tried sequentially until a response is collected. This reveals one strength of the hundred-digit block--a call center can be directed to contact some number of households with a prefix, and the interviewers are free to replace the specific households as needed.

From a household, there are often multiple possible respondents. A screening protocol is used to decide on the best respondent. This forms the second important probability of selection.


Stratification

Stratification is the segmentation of a frame into discrete classes, and then sampling from them separately.

Special care must be used when analyzing a subpopulation estimate. Survey weights, especially in the context of a complex stratified design, are specifically applicable for population estimates. As an example: variance is estimated separately for each strata. Overall estimated variance reflects every strata. If a subpopulation happens to exclude a strata, the variance is inappropriately deflated. (The one exception is when the subpopulation in question is a stratifying level, because then those strata are necessarily always going to be absent.) To appropriately calculate a subpopulation estimate in Stata, try using a subpop or over option rather than an if expression.


CategoryRicottone

Statistics/SurveySampling (last edited 2025-04-18 21:07:48 by DominicRicottone)