Survey Statistics


Statistics with Survey Data

There are key differences between surey statistics and model statistics.

Model-based inference

Build a mathematical model that describes a population. Generate a random sample from that population to generate estimates. Estimate how the error terms of those estimates would vary if repeated samples were drawn.

Design-based inference

Identify a population with fixed descriptives. Randomly draw a sample from that population to collect measures from. Estimate how the measures would vary if repeated samples were drawn.

Inferential statistics from complex survey data

Using model-based inference while accounting for survey design


Survey Populations

The population of a survey has fixed but unknown descriptives. A random sample is contacted, rather than the full population, for reasons of cost and administration. The population descriptives are estimated based on the design of the sample and the descriptives collected from that sample.

Limitations of Survey Sampling

If a sample is a poor fit for the population, then it will be difficult/impossible to estimate population descriptives. This is why samples are *randomly* drawn.

But for random sampling to succeed, the target populations needs to be completely identified. Incompleteness of data can skew samples.

Limitations of Surveying

Non-random non-response is an additional roadblock to estimating population descriptives.

Some populations are inherently difficult to contact, due to:

The availability of contact data often dictates the mode of survey instrument. Some populations are not easily contacted by specific modes, due to:

Common Sampling Frames


CategoryRicottone