= Survey Sampling =

'''Survey sampling''' is a procedure of selecting prospective respondents for a survey experiment.

<<TableOfContents>>

----



== Probability Sampling ==

Probability sampling begins with identification of the [[Statistics/SurveyFrame|frame]]. A properly-specified frame covers the complete true population. An improperly-specified frame introduces [[Statistics/SurveyInference#Sampling_Error|sampling error]].

A '''census''' is a survey of the complete frame. The probability of selection is 1, so the [[Statistics/SurveyWeights|base survey weight]] is also 1.



=== Methods ===

The baseline for survey sampling is [[Statistics/SimpleRandomSample|SRS]].

'''Probability proportionate to size''' ('''PPS''') ensures that chance to be contacted increases with the magnitude of some measure. For example, in a study of utility customers, the largest consumers of that utility should almost always be contacted. 

'''Systematic sampling''' selects every Nth case from a list.



=== Stratification ===

Stratification is the partition of a frame into discrete classes using information that is known for the entire frame. Each stratum is sampled separately, often with differing probabilities of selection according to some allocation method.

Allocation methods include:
 * Proportional allocation does actually maintain a constant probability of selection.
 * Equal allocation, taking the same number from each stratum regardless of their sizes.
 * '''Neyman allocation''' optimizes for a key metric's overall [[Statistics/Variance|variance]], using estimates of each stratum's variance.

When a stratum is purposely allocated more sample than would be prescribed by proportional allocation, it is said to be '''oversampled'''. One reason to do this (apart from variance optimization) is to ensure that enough responses are collected from a minority group to support [[Statistics/StudentsTTest|t tests]].

Stratification qualifies as a '''complex survey design''' because the standard errors must be estimated with attention to strata. As an example, if a stratum happens to be excluded from an estimate, its contribution towards true variance is excluded from a conventional estimator. This is especially common with sub-population estimates, and this is why [[Stata]] supports `subpop` and `over` options for many estimation commands that can otherwise seem redundant given `if` expressions. As another example, a stratum may only have one observation (i.e., a singleton stratum), and a conventional estimator will of course fail in this case.

Ideally, stratification uses information that is known to be true, such that there is no reason for cases to be 're-classified'. Manipulating strata in such a way distorts variance estimates.



=== Multi-stage ===

Similar to stratification, multi-stage sampling partitions a frame into discrete classes using information that is known for the entire frame. Often this is geographic. From this first stage, '''primary sampling units''' ('''PSU''') are selected. The second stage selects '''secondary sampling units''' ('''SSU''') from only the selected PSUs.

This method is useful for in-person surveying, as it is logistically necessary to constrain the geography of survey administration.

Multi-stage sampling also qualifies as a complex survey design because there are PSUs with some probability to be selected in one stage, but zero probability to be selected in the second.

----



=== Non-probability Sampling ===

Non-probability sampling involves soliciting responses from a stream of people that differs from the true population. There is no known probability of selection. There are some people with zero probability of responding, and ''generally'' there are also some people who respond with certainty (i.e., 'professional' survey takers).



----
CategoryRicottone