Survey Inference

Survey inference is an experiment design to estimate parameters of a population using measurements from a sample of it.


Approaches to Survey Data

Model-based inference

  1. Build a mathematical model that describes a population.
  2. Generate a random sample from that population to generate estimates.
  3. Estimate how the error terms of those estimates would vary if repeated samples were drawn.

In other words, the key is how well the model describes the population.

Design-based inference

  1. Identify a population with fixed descriptives.
  2. Draw a sample from that population to collect measures from.
  3. Estimate how the measures would vary if repeated samples were drawn.

In other words, the key is how well the sample fits the population. If the full population were contacted, measures would be perfect.

Inferential statistics from complex survey data

Using model-based inference while accounting for survey design.


Population of Interest

Survey statistics are estimates of a population parameter calculated from measurements. The population of interest is sometimes referred to as a universe for such calculations.

Records of the population of interest form a frame.

Sampling Error

If a sample is a poor fit for the population, then it will be difficult/impossible to estimate population parameters.

Random sampling attempts to address this. But for random sampling to succeed, the population of interest needs to be completely specified.

If a frame contains records that are not in the population of interest, it features over-coverage. If it misses records in the population of interest, it features under-coverage.

Inaccurate or out-of-date information impacts non-response.

Auxiliary information can inform and guide sample design, so richness of a frame can also be a contributing factor to sampling error.

See here for a details on survey sampling.


Measurement

Survey interviews measure characteristics about sampled records and, whenever possible, measure the responses of those who are successfully contacted and then cooperate.

The mode of interview is often pre-determined from the selection of a frame, but there are advantages and disadvantages to each, especially in costs.

Non-sampling Error

Specification error refers to a difference between the measures of interest and what is actually measured.

Measurement error refers to a variety of factors that interfere with the interview. For example, for a survey of self-reported political opinions, there may be an interviewer effect from the apparent race of an interviewer.

Non-response has the potential to introduce bias, especially if non-response is not random. For example, some populations are inherently difficult to contact, due to:

This should be considered during sample design.

As another example, some populations are inherently difficult to contact by certain survey modes, due to:

This should be considered when selecting a frame in the first place.


CategoryRicottone

Statistics/SurveyInference (last edited 2025-01-10 15:49:44 by DominicRicottone)