Differences between revisions 2 and 5 (spanning 3 versions)

Survey Sampling

Survey sampling is a procedure of selecting prospective respondents for a survey experiment.

Contents

Survey Sampling
1. Probability Sampling

Probability Sampling

Probability sampling begins with identification of the frame. A properly-specified frame covers the complete true population. An improperly-specified frame introduces sampling error.

A census is a survey of the complete frame. The probability of selection is 1, so the base survey weight is also 1.

Methods

The baseline for survey sampling is SRS.

Probability proportionate to size (PPS) ensures that chance to be contacted increases with the magnitude of some measure. For example, in a study of utility customers, the largest consumers of that utility should almost always be contacted.

Systematic sampling selects every Nth case from a list.

Stratification

Stratification is the partition of a frame into discrete classes using information that is known for the entire frame. Each stratum is sampled separately, often with differing probabilities of selection.

Stratification qualifies as a complex survey design because the standard errors must be estimated with attention to strata. As an example, if a stratum happens to be excluded from an estimate, its contribution towards true variance is excluded from a conventional estimator. This is especially common with sub-population estimates, and this is why Stata supports subpop and over options for many estimation commands that can otherwise seem redundant given if expressions. As another example, a stratum may only have one observation (i.e., a singleton stratum), and a conventional estimator will of course fail in this case.

Ideally, stratification uses information that is known to be true, such that there is no reason for cases to be 're-classified'. Manipulating strata in such a way distorts variance estimates.

Multi-stage

Similar to stratification, multi-stage sampling partitions a frame into discrete classes using information that is known for the entire frame. Often this is geographic. From this first stage, primary sampling units (PSU) are selected. The second stage selects secondary sampling units (SSU) from only the selected PSUs.

This method is useful for in-person surveying, as it is logistically necessary to constrain the geography of survey administration.

Multi-stage sampling also qualifies as a complex survey design because there are PSUs with some probability to be selected in one stage, but zero probability to be selected in the second.

Non-probability Sampling

Non-probability sampling involves soliciting responses from a stream of people that differs from the true population. There is no known probability of selection. There are some people with zero probability of responding, and generally there are also some people who respond with certainty (i.e., 'professional' survey takers).

CategoryRicottone

-  ⇤ ← Revision 2 as of 2025-04-18 21:06:23 → 
  Size: 3531
  Editor: DominicRicottone
  Comment: Content
+   ← Revision 5 as of 2025-11-03 01:15:58 → ⇥
  Size: 3213
  Editor: DominicRicottone
  Comment: Reorg
-Deletions are marked like this.
+Additions are marked like this.
 Line 11:
-== Frames ==
+== Probability Sampling ==
 Line 13:
-Survey sampling begins with identification of the '''frame'''. A properly-specified frame covers the complete true population. An improperly-specified frame introduces [[Statistics/SurveyInference#Sampling_Error|sampling error]].
+Probability sampling begins with identification of the [[Statistics/SurveyFrame|frame]]. A properly-specified frame covers the complete true population. An improperly-specified frame introduces [[Statistics/SurveyInference#Sampling_Error|sampling error]].

A '''census''' is a survey of the complete frame. The probability of selection is 1, so the [[Statistics/SurveyWeights|base survey weight]] is also 1.
-Line 17:
+Line 19:
-=== Non-probability Panel ===
+=== Methods ===
-Line 19:
+Line 21:
-A '''non-probability panel''' is a stream of potential respondents that is different from the true population, and for which there is no known probability of selection. This type of panel poses important challenges to the model of [[Statistics/SurveyInference|survey inference]], but it is also much cheaper to collect data in this manner.
+The baseline for survey sampling is [[Statistics/SimpleRandomSample|SRS]].
-Line 21:
+Line 23:
-The probability of selection is not known so the [[Statistics/SurveyWeights|base survey weight]] is simply 1.
+'''Probability proportionate to size''' ('''PPS''') ensures that chance to be contacted increases with the magnitude of some measure. For example, in a study of utility customers, the largest consumers of that utility should almost always be contacted. 

'''Systematic sampling''' selects every Nth case from a list.



=== Stratification ===

Stratification is the partition of a frame into discrete classes using information that is known for the entire frame. Each stratum is sampled separately, often with differing probabilities of selection. 

Stratification qualifies as a '''complex survey design''' because the standard errors must be estimated with attention to strata. As an example, if a stratum happens to be excluded from an estimate, its contribution towards true variance is excluded from a conventional estimator. This is especially common with sub-population estimates, and this is why [[Stata]] supports `subpop` and `over` options for many estimation commands that can otherwise seem redundant given `if` expressions. As another example, a stratum may only have one observation (i.e., a singleton stratum), and a conventional estimator will of course fail in this case.

Ideally, stratification uses information that is known to be true, such that there is no reason for cases to be 're-classified'. Manipulating strata in such a way distorts variance estimates.



=== Multi-stage ===

Similar to stratification, multi-stage sampling partitions a frame into discrete classes using information that is known for the entire frame. Often this is geographic. From this first stage, '''primary sampling units''' ('''PSU''') are selected. The second stage selects '''secondary sampling units''' ('''SSU''') from only the selected PSUs.

This method is useful for in-person surveying, as it is logistically necessary to constrain the geography of survey administration.

Multi-stage sampling also qualifies as a complex survey design because there are PSUs with some probability to be selected in one stage, but zero probability to be selected in the second.
-Line 27:
+Line 51:
-=== Census ===
+=== Non-probability Sampling ===
-Line 29:
+Line 53:
-A '''census''' is a survey of the complete true population. The probability of selection is 1, so the [[Statistics/SurveyWeights|base survey weight]] is also 1.

----



== Address Listing ==


----



=== Random Digit Dialing ===

A '''random digit dialing''' ('''RDD''') sample fundamentally has two steps.

First, all possible '''hundred-digit blocks''' form a frame. These are the full set of 8-digit prefixes for which there are 100 10-digit phone numbers. The best practice, when it is feasible, is to remove the hundred-digit blocks with 0 or 1 residential numbers (according to either published phone books or collected data). A [[Statistics/SimpleRandomSample|SRS]] with replacement is then drawn from this frame. This forms the first important probability of selection.

Stratified RDD sampling relies on the segmentation of the hundred-digit block frame. Generally, this is done according to geographic bounds (state, county, or ZIP code) for which 3-digit area codes can be roughly matched. Given this information and additional statistics, e.g. which ZIP codes have higher concentrations of black residents, it can be possible to segment the frame by demographic propensities.

For each selection (because a hundred-digit block can be selected multiple times), a random 10-digit phone number from the hundred-digit block is called. In some cases, 10-digit phone numbers are tried sequentially until a response is collected. This reveals one strength of the hundred-digit block--a call center can be directed to contact some number of households with a prefix, and the interviewers are free to replace the specific households as needed.

From a household, there are often multiple possible respondents. A screening protocol is used to decide on the best respondent. This forms the second important probability of selection.

----



== Stratification ==

Stratification is the segmentation of a frame into discrete classes, and then sampling from them separately.

Special care must be used when analyzing a subpopulation estimate. Survey weights, especially in the context of a complex stratified design, are ''specifically applicable'' for population estimates. As an example: variance is estimated separately for each strata. Overall estimated variance reflects every strata. If a subpopulation happens to exclude a strata, the variance is inappropriately deflated. The one exception is when the subpopulation in question ''is'' a stratifying level, because then those strata are necessarily always going to be absent.
+Non-probability sampling involves soliciting responses from a stream of people that differs from the true population. There is no known probability of selection. There are some people with zero probability of responding, and ''generally'' there are also some people who respond with certainty (i.e., 'professional' survey takers).

Diff for "Statistics/SurveySampling"