Differences between revisions 27 and 28

Survey Weights

Survey weights account for the survey design, sampling error, and non-sampling error.

Contents

Survey Weights

Description

Survey data is collected through a mechanism which can be specified statistically. If it is not specified, bias can be introduced and estimates can be over-confident.

Inverse variance weights are related, but not the same.

Survey weights begin with a design weight reflecting probability of selection. Generally this is simply the inverse of the sampling probability: n_k/N for all strata k.

All real surveys feature non-sampling error, especially nonresponse. If nonresponse is uncorrelated with key metrics, it is negligible. Otherwise there is potential for nonresponse bias. This bias can be corrected through survey weights in a few ways:

Modeling on insignificant or uncorrelated attributes does not introduce bias, but it does inflate variance.

Calibration can be used to:

make estimates be consistent with known true population proportions
correct sampling error like undercoverage or overcoverage
further correct for non-sampling error like nonresponse bias

The methods here include:

raking
iterative proportional fitting
RIM weighting
GREG estimators

Weighted Estimators

Survey weights w are designed such that a population proportion μ can be calculated using the weighted estimator Σ(wx) / Σw.

In the case that all cases have equal weight, it is straightforward to show that the variance of that estimator is w²σ².

In any other case, the variance is given by Σ(w²σ²) / (Σw)². This ratio must then be linearized or simulated to arrive at an approximate variance. Taylor expansion is a common strategy for linearization.

Reading Notes

The Effect of Weight Trimming on Nonlinear Survey Estimates, Frank J. Potter, 1993
Sampling Weights and Regression Analysis, Christopher Winship and Larry Radbill, 1994
Improving on Probability Weighting for Household Size, Andrew Gelman and Thomas C. Little, 1998
Using Calibration Weighting to Adjust for Nonresponse and Coverage Errors, Phillip S. Kott, 2006
Struggles with Survey Weighting and Regression Modeling, Andrew Gelman, 2007
The calibration approach in survey theory and practice, Carl-Erik Särndal, 2007
A single frame multiplicity estimator for multiple frame surveys, Fulvia Mecatti, 2007
Practical Considerations in Raking Survey Data; Michael P Battaglia, David C Hoaglin, and Martin R Frankel (and sometimes David Izrael); 2009
Statistical Paradises and Paradoxes in Big Data, Xiao-Li Meng, 2018
A New Paradigm for Polling, Michael A. Bailey, 2023
The “Law of Large Populations” Does Not Herald a Paradigm Shift in Survey Sampling, Roderick J. Little, 2023
Surveys of Consumers Technical Report: Technical Documentation for the 2024 Methodological Transition to Web Surveys, 2024
The effect of online interviews on the University of Michigan Survey of Consumer Sentiment, Ryan Cummings and Ernie Tedeschi, 2024

CategoryRicottone

-  ⇤ ← Revision 27 as of 2026-02-09 21:18:43 → 
  Size: 5537
  Editor: DominicRicottone
  Comment: Added reading notes
+   ← Revision 28 as of 2026-02-10 20:36:52 → ⇥
  Size: 4617
  Editor: DominicRicottone
  Comment: Rewrite
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
-'''Survey weights''' account for the [[Statistics/SurveySampling|design of a survey sample]] and [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]].
+'''Survey weights''' account for the [[Statistics/SurveySampling|survey design]], [[Statistics/SurveyInference#Sampling_Error|sampling error]], and [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]].
 Line 13:
-The design weight, or base weight, reflects unequal [[Statistics/SurveySampling|probabilities of selection]]. Generally this is simply the inverse of the sampling probability: ''n,,k,,/N'' for all strata ''k''.
+Survey data is collected through a mechanism which can be specified statistically. If it is not specified, bias can be introduced and [[Analysis/Estimation|estimates]] can be over-confident.
 Line 15:
+[[Statistics/InverseVarianceWeights|Inverse variance weights]] are related, but not the same.
-Line 16:
+Line 17:
+Survey weights begin with a [[Statistics/DesignWeight|design weight]] reflecting [[Statistics/SurveySampling|probability of selection]]. Generally this is simply the inverse of the sampling probability: ''n,,k,,/N'' for all strata ''k''.
-Line 17:
+Line 19:
-=== Nonresponse Adjustments ===

All real surveys feature [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]], especially nonresponse. If nonresponse is uncorrelated with key metrics, it is negligible. There almost always is some observable [[Statistics/NonresponseBias|nonresponse bias]], i.e. an attribute that is known for the entire population and is correlated with both a key metric and responsivity. This bias can be corrected with a '''nonresponse adjustment''' to the survey weights.

It is also reasonable to expect that there is ''unobserved'' bias, i.e. an attribute that is not known.

A nonresponse adjustment factor generally moves weight from nonrespondents to comparable respondents. If there are no significant attributes that can be used to establish comparability, then the adjustment is a flat multiplier: the total of cases over the count of respondents. (Nonrespondents have their weight set to 0.)

If there are significant attributes, responsivity can be modeled. There are generally two approaches:
 * '''weighting class adjustment''': The population (or stratum subpopulation) is partitioned into N-tiles according to the predicted responsivity. Each N-tile then receives a separate flat multiplier as described above.
 * '''propensity score adjustment''': Every respondent's weight is multiplied by the inverse of the predicted responsivity, while nonrespondents have their weight set to 0. General practice is then to re-normalize the weights such that they sum to the same total as before applying the adjustment.
+All real surveys feature [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]], especially [[Statistics/SurveyNonresponse|nonresponse]]. If nonresponse is uncorrelated with key metrics, it is negligible. Otherwise there is potential for [[Statistics/NonresponseBias|nonresponse bias]]. This bias can be corrected through survey weights in a few ways:
 * [[Statistics/InverseProbabilityWeights|inverse propensity adjustments]]
 * [[Statistics/WeightingClassAdjustment|weighting class adjustments]]
-Line 31:
+Line 25:
+[[Statistics/Calibration|Calibration]] can be used to:
 * make estimates be consistent with known true population proportions
 * correct [[Statistics/SurveyInference#Sampling_Error|sampling error]] like undercoverage or overcoverage
 * further correct for non-sampling error like nonresponse bias
-Line 32:
+Line 30:
-=== Post-Stratification ===

Post-stratification is employed in survey weighting for several reasons:
 * There may be measurable [[Statistics/SurveyInference#Sampling_Error|sampling errors]], such as undercoverage, which can be corrected.
 * Incorporating auxiliary information, i.e. the known distribution of the population, into survey estimates should increase accuracy.
 * Post-stratified estimates are consistent. Estimates across surveys will match on e.g. the proportion of women in the population if they are all post-stratified according to the same targets.

There are two approaches to this post-stratification: [[TheCalibrationApproachInSurveyTheoryAndPractice|GREG estimation and calibration estimation]]. Calibration is known under a variety of other names: '''raking''', '''iterative proportional fitting''', and '''RIM weighting'''.
+The methods here include:
 * raking
 * iterative proportional fitting
 * RIM weighting
 * [[Statistics/GeneralizedRegressionEstimator|GREG estimators]]
-Line 46:
+Line 40:
-== Usage ==



=== Weighted Estimators ===
+== Weighted Estimators ==
-Line 54:
+Line 44:
-In the case that all cases have equal weight, [[Statistics/Moments#Description|it is straightforward to show]] that the variance of that estimator is ''w^2^σ^2^''.
+In the case that all cases have equal weight, [[Statistics/Moments#Description|it is straightforward to show]] that the [[Statistics/Variance|variance]] of that estimator is ''w^2^σ^2^''.
-Line 67:
+Line 57:
+ * [[UsingCalibrationWeightingToAdjustForNonresponseAndCoverageErrors|Using Calibration Weighting to Adjust for Nonresponse and Coverage Errors]], Phillip S. Kott, 2006

Diff for "Statistics/SurveyWeights"

Survey Weights

Description

Weighted Estimators

Reading Notes