Differences between revisions 19 and 22 (spanning 3 versions)

Survey Weights

Survey weights account for the design of a survey sample and non-sampling error.

Contents

Survey Weights
1. Description
  1. Non-Response Adjustments
  2. Post-Stratification
2. Usage
  1. Weighted Estimators

Description

The design weight, or base weight, reflects unequal probabilities of selection. Generally this is simply the inverse of the sampling probability: n_k/N for all strata k.

Non-Response Adjustments

All real surveys feature non-sampling error, especially non-response. If non-response is uncorrelated with key metrics, it is negligible. There almost always is some observable non-response bias, i.e. an attribute that is known for the entire population and is correlated with both a key metric and responsivity. This bias can be corrected with a non-response adjustment to the survey weights.

It is also reasonable to expect that there is unobserved bias, i.e. an attribute that is not known.

A non-response adjustment factor generally moves weight from non-respondents to comparable respondents. If there are no significant attributes that can be used to establish comparability, then the adjustment is a flat multiplier: the total of cases over the count of respondents. (Non-respondents have their weight set to 0.)

If there are significant attributes, responsivity can be modeled. There are generally two approaches:

weighting class adjustment: The population (or stratum subpopulation) is partitioned into N-tiles according to the predicted responsivity. Each N-tile then receives a separate flat multiplier as described above.
propensity score adjustment: Every respondent's weight is multiplied by the inverse of the predicted responsivity, while non-respondents have their weight set to 0. General practice is then to re-normalize the weights such that they sum to the same total as before applying the adjustment.

Modeling on insignificant or uncorrelated attributes does not introduce bias, but it does inflate variance.

Post-Stratification

Post-stratification is employed in survey weighting for several reasons:

There may be measurable sampling errors, such as undercoverage, which can be corrected.
Incorporating auxiliary information, i.e. the known distribution of the population, into survey estimates should increase accuracy.
Post-stratified estimates are consistent. Estimates across surveys will match on e.g. the proportion of women in the population if they are all post-stratified according to the same targets.

There are two approaches to this post-stratification: GREG estimation and calibration estimation. Calibration is known under a variety of other names: raking, iterative proportional fitting, and RIM weighting.

Usage

Weighted Estimators

Survey weights w are designed such that a population proportion μ can be calculated using the weighted estimator Σ(wx) / Σw.

In the case that all cases have equal weight, it is straightforward to show that the variance of that estimator is w²σ².

In any other case, the variance is given by Σ(w²σ²) / (Σw)². This ratio must then be linearized or simulated to arrive at an approximate variance. Taylor expansion is a common strategy for linearization.

CategoryRicottone

-  ⇤ ← Revision 19 as of 2025-04-18 20:30:26 → 
  Size: 4200
  Editor: DominicRicottone
  Comment: Cleanup
+   ← Revision 22 as of 2025-08-10 00:55:28 → ⇥
  Size: 3671
  Editor: DominicRicottone
  Comment: Content
-Deletions are marked like this.
+Additions are marked like this.
 Line 13:
-Survey weights begin with the inverse of the [[Statistics/SurveySampling|sampling probability]]. This is known as the '''base weight'''.
+The design weight, or base weight, reflects unequal [[Statistics/SurveySampling|probabilities of selection]]. Generally this is simply the inverse of the sampling probability: ''n,,k,,/N'' for all strata ''k''.
 Line 15:
-The weight of non-respondents, or more generally anyone who cannot be used for analysis, is reallocated to respondents. This is usually done in a manner that accounts for [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]], especially [[Statistics/NonResponseBias|measurable non-response bias]]. In the simplest case though, if there are no meaningful predictors of response propensity, the weights of non-respondents can be set to 0 and the weights of respondents can be scaled up by a corresponding flat adjustment factor.
-Line 17:
+Line 16:
-The final step is [[Statistics/PostStratification|post-stratification]]. This can address [[Statistics/SurveyInference#Sampling_Error|sampling errors]] such as undercoverage. Typically, post-stratification is done by a large set of discrete dimensions such that the true population counts are not known. An algorithm called '''raking''' or '''calibration''' is used to approximate the adjustment.
+=== Non-Response Adjustments ===

All real surveys feature [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]], especially non-response. If non-response is uncorrelated with key metrics, it is negligible. There almost always is some observable [[Statistics/NonResponseBias|non-response bias]], i.e. an attribute that is known for the entire population and is correlated with both a key metric and responsivity. This bias can be corrected with a '''non-response adjustment''' to the survey weights.

It is also reasonable to expect that there is ''unobserved'' bias, i.e. an attribute that is not known.

A non-response adjustment factor generally moves weight from non-respondents to comparable respondents. If there are no significant attributes that can be used to establish comparability, then the adjustment is a flat multiplier: the total of cases over the count of respondents. (Non-respondents have their weight set to 0.)

If there are significant attributes, responsivity can be modeled. There are generally two approaches:
 * '''weighting class adjustment''': The population (or stratum subpopulation) is partitioned into N-tiles according to the predicted responsivity. Each N-tile then receives a separate flat multiplier as described above.
 * '''propensity score adjustment''': Every respondent's weight is multiplied by the inverse of the predicted responsivity, while non-respondents have their weight set to 0. General practice is then to re-normalize the weights such that they sum to the same total as before applying the adjustment.

Modeling on insignificant or uncorrelated attributes does not introduce bias, but it does inflate variance.



=== Post-Stratification ===

Post-stratification is employed in survey weighting for several reasons:
 * There may be measurable [[Statistics/SurveyInference#Sampling_Error|sampling errors]], such as undercoverage, which can be corrected.
 * Incorporating auxiliary information, i.e. the known distribution of the population, into survey estimates should increase accuracy.
 * Post-stratified estimates are consistent. Estimates across surveys will match on e.g. the proportion of women in the population if they are all post-stratified according to the same targets.

There are two approaches to this post-stratification: [[TheCalibrationApproachInSurveyTheoryAndPractice|GREG estimation and calibration estimation]]. Calibration is known under a variety of other names: '''raking''', '''iterative proportional fitting''', and '''RIM weighting'''.
-Line 23:
+Line 46:
-== Non-Response Adjustments ==

Non-response bias exists when non-response is correlated with a metric of interest, introducing bias into the population estimate.

If non-response is measurable, i.e. response propensity can be predicted using auxiliary information known about the entire sample, then it can also be corrected for.

A '''weighting class adjustment''' is calculated by using predicted propensity to segment the sample, leading to a response rate per class. Within each class, the inverse of the response rate is the non-response adjustment. Non-respondents have their weight set to 0, as it has been reallocated to respondents that are predicted to be similar in terms of response patterns.

A '''propensity score adjustment''' is calculated as the inverse of predicted propensity.

----
+== Usage ==
-Line 37:
+Line 50:
-== Post-Stratification ==
+=== Weighted Estimators ===
-Line 39:
+Line 52:
-Post-stratification is applied because some characteristics of the true population are known, and furthermore are expected to correlate with the metric of interest. By forcing the survey weights to match the known distribution, they are more likely to correct for biases introduced by [[Statistics/SurveyInference#Sampling_Error|sampling errors]]. The population estimates are also more applicable to the true population.
+Survey weights ''w'' are designed such that a population proportion ''μ'' can be calculated using the weighted estimator ''Σ(wx) / Σw''.
-Line 41:
+Line 54:
-As a result, there are circumstances where post-stratified weights are not applicable. For example, when modeling non-response, the population of interest is in fact the sample, ''not'' the true population.
+In the case that all cases have equal weight, [[Statistics/Moments#Description|it is straightforward to show]] that the variance of that estimator is ''w^2^σ^2^''.
-Line 43:
+Line 56:
-Post-stratification is often done according to many complex dimensions. For example, the interactions of sex by age [[Statistics/Binning|bins]] (male and 18-24; male and 25-34; and so on). True population counts for the margins of these dimensions are usually available, not not necessarily the cells/intersections. Furthermore, some intersections are likely to have so few respondents that the weights would be inappropriately large. 

'''Iterative proportional fitting''', more generally known as '''raking''', is an algorithm for post-stratification in such a circumstance. It involves looping over the dimensions, post-stratifying the weights toward those marginal counts one at a time. This small loop is then repeated in a larger loop until a convergence criterion is achieved, or for a pre-determined number of iterations. '''RIM (random iterative method) weighting''' is essentially the same thing.

'''Calibration''', or '''GREG (generalized regression) estimation''', is a more generalized algorithm. It utilizes a linear regression model to re-weight towards marginal counts.

In terms of automated convergence criteria, a common choice is to stop when the root mean square (RMS) falls below a threshold like 0.000005. Another is to stop when the calculated change between adjustments falls below a threshold like 0.0001.
+In any other case, the variance is given by ''Σ(w^2^σ^2^) / (Σw)^2^''. This ratio must then be linearized or simulated to arrive at an approximate variance. [[Calculus/TaylorSeries|Taylor expansion]] is a common strategy for linearization.

Diff for "Statistics/SurveyWeights"