|
Size: 2330
Comment:
|
Size: 5537
Comment: Added reading notes
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 3: | Line 3: |
| Survey weights account for the design of a survey sample and other biases/errors introduced by a survey instrument. | '''Survey weights''' account for the [[Statistics/SurveySampling|design of a survey sample]] and [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]]. |
| Line 11: | Line 11: |
| == The Basic Process == | == Description == |
| Line 13: | Line 13: |
| 1. Set survey dispositions 2. Calculate base weights 3. Apply non-response adjustments to base weights 4. Calibrate the weights |
The design weight, or base weight, reflects unequal [[Statistics/SurveySampling|probabilities of selection]]. Generally this is simply the inverse of the sampling probability: ''n,,k,,/N'' for all strata ''k''. |
| Line 18: | Line 15: |
| See [[SurveyDisposition|here]] for details about survey dispositions. | === Nonresponse Adjustments === All real surveys feature [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]], especially nonresponse. If nonresponse is uncorrelated with key metrics, it is negligible. There almost always is some observable [[Statistics/NonresponseBias|nonresponse bias]], i.e. an attribute that is known for the entire population and is correlated with both a key metric and responsivity. This bias can be corrected with a '''nonresponse adjustment''' to the survey weights. It is also reasonable to expect that there is ''unobserved'' bias, i.e. an attribute that is not known. A nonresponse adjustment factor generally moves weight from nonrespondents to comparable respondents. If there are no significant attributes that can be used to establish comparability, then the adjustment is a flat multiplier: the total of cases over the count of respondents. (Nonrespondents have their weight set to 0.) If there are significant attributes, responsivity can be modeled. There are generally two approaches: * '''weighting class adjustment''': The population (or stratum subpopulation) is partitioned into N-tiles according to the predicted responsivity. Each N-tile then receives a separate flat multiplier as described above. * '''propensity score adjustment''': Every respondent's weight is multiplied by the inverse of the predicted responsivity, while nonrespondents have their weight set to 0. General practice is then to re-normalize the weights such that they sum to the same total as before applying the adjustment. Modeling on insignificant or uncorrelated attributes does not introduce bias, but it does inflate [[Statistics/Variance|variance]]. === Post-Stratification === Post-stratification is employed in survey weighting for several reasons: * There may be measurable [[Statistics/SurveyInference#Sampling_Error|sampling errors]], such as undercoverage, which can be corrected. * Incorporating auxiliary information, i.e. the known distribution of the population, into survey estimates should increase accuracy. * Post-stratified estimates are consistent. Estimates across surveys will match on e.g. the proportion of women in the population if they are all post-stratified according to the same targets. There are two approaches to this post-stratification: [[TheCalibrationApproachInSurveyTheoryAndPractice|GREG estimation and calibration estimation]]. Calibration is known under a variety of other names: '''raking''', '''iterative proportional fitting''', and '''RIM weighting'''. |
| Line 24: | Line 46: |
| == Calculating Weights == | == Usage == |
| Line 26: | Line 48: |
| The base weight is the inverse of the probability of being sampled. Think ''desired over actual''. As such, the sum of base weights should equal the population size. | |
| Line 28: | Line 49: |
| For a SRS design, this is calculated as a simple rate. Given a population of 20,000 and a sample size of 667, the propbability of being sampled is 20,000/667 = '''29.99'''. | |
| Line 30: | Line 50: |
| For a STSRS design, the same process is applied per stratum. | === Weighted Estimators === Survey weights ''w'' are designed such that a population proportion ''μ'' can be calculated using the weighted estimator ''Σ(wx) / Σw''. In the case that all cases have equal weight, [[Statistics/Moments#Description|it is straightforward to show]] that the variance of that estimator is ''w^2^σ^2^''. In any other case, the variance is given by ''Σ(w^2^σ^2^) / (Σw)^2^''. This ratio must then be linearized or simulated to arrive at an approximate variance. [[Calculus/TaylorSeries|Taylor expansion]] is a common strategy for linearization. |
| Line 36: | Line 62: |
| == Non-Response Adjustments == | == Reading Notes == |
| Line 38: | Line 64: |
| Survey weights can adjust for non-response bias. The core concept is to use auxiliary frame data (i.e. descriptives known for ''both'' respondents and non-respondents) that is correlated with key measures or responsivity. '''Weighting class adjustments''' divides the sample into weighting classes and applies a class-specific adjustment factor to every case. '''Propensity score adjustments''' calculates the inverse of the estimated probability to respond and applies that as a secondary weight. Adjustments are applied in phases. Cases with unknown eligibility often cannot be adjusted through these methods, and need to be removed. Ineligible cases often are undesirable in analysis datasets, so weights are further adjusted to account for their removal. ---- == Calibration Adjustments == Survey weights can be adjusted to ensure that known population descriptives are reflected in the estimates. Methods include: * post-stratification (i.e. ''desired over actual'') * raking * linear calibration (GREG) === Raking === '''Raking''', or '''RIM weighting''', involves applying post-stratification by each dimension iteratively, until the weights converge. Convergence is defined as the root mean square (RMS) falling below a threshold, typically 0.000005. Raked weights generally should not be applied if their efficiency falls below 70%. |
* [[TheEffectOfWeightTrimmingOnNonlinearSurveyEstimates|The Effect of Weight Trimming on Nonlinear Survey Estimates]], Frank J. Potter, 1993 * [[SamplingWeightsAndRegressionAnalysis|Sampling Weights and Regression Analysis]], Christopher Winship and Larry Radbill, 1994 * [[ImprovingOnProbabilityWeightingForHouseholdSize|Improving on Probability Weighting for Household Size]], Andrew Gelman and Thomas C. Little, 1998 * [[StrugglesWithSurveyWeightingAndRegressionModeling|Struggles with Survey Weighting and Regression Modeling]], Andrew Gelman, 2007 * [[TheCalibrationApproachInSurveyTheoryAndPractice|The calibration approach in survey theory and practice]], Carl-Erik Särndal, 2007 * [[ASingleFrameMultiplicityEstimatorForMultipleFrameSurveys|A single frame multiplicity estimator for multiple frame surveys]], Fulvia Mecatti, 2007 * [[PracticalConsiderationsInRakingSurveyData|Practical Considerations in Raking Survey Data]]; Michael P Battaglia, David C Hoaglin, and Martin R Frankel (and sometimes David Izrael); 2009 * [[StatisticalParadisesAndParadoxesInBigData|Statistical Paradises and Paradoxes in Big Data]], Xiao-Li Meng, 2018 * [[ANewParadigmForPolling|A New Paradigm for Polling]], Michael A. Bailey, 2023 * [[TheLawOfLargePopulationsDoesNotHeraldAParadigmShiftInSurveySampling|The “Law of Large Populations” Does Not Herald a Paradigm Shift in Survey Sampling]], Roderick J. Little, 2023 * [[SurveysOfConsumersTechnicalReport|Surveys of Consumers Technical Report: Technical Documentation for the 2024 Methodological Transition to Web Surveys]], 2024 * [[TheEffectOfOnlineInterviewsOnTheUniversityOfMichiganSurveyOfConsumerSentiment|The effect of online interviews on the University of Michigan Survey of Consumer Sentiment]], Ryan Cummings and Ernie Tedeschi, 2024 |
Survey Weights
Survey weights account for the design of a survey sample and non-sampling error.
Contents
Description
The design weight, or base weight, reflects unequal probabilities of selection. Generally this is simply the inverse of the sampling probability: nk/N for all strata k.
Nonresponse Adjustments
All real surveys feature non-sampling error, especially nonresponse. If nonresponse is uncorrelated with key metrics, it is negligible. There almost always is some observable nonresponse bias, i.e. an attribute that is known for the entire population and is correlated with both a key metric and responsivity. This bias can be corrected with a nonresponse adjustment to the survey weights.
It is also reasonable to expect that there is unobserved bias, i.e. an attribute that is not known.
A nonresponse adjustment factor generally moves weight from nonrespondents to comparable respondents. If there are no significant attributes that can be used to establish comparability, then the adjustment is a flat multiplier: the total of cases over the count of respondents. (Nonrespondents have their weight set to 0.)
If there are significant attributes, responsivity can be modeled. There are generally two approaches:
weighting class adjustment: The population (or stratum subpopulation) is partitioned into N-tiles according to the predicted responsivity. Each N-tile then receives a separate flat multiplier as described above.
propensity score adjustment: Every respondent's weight is multiplied by the inverse of the predicted responsivity, while nonrespondents have their weight set to 0. General practice is then to re-normalize the weights such that they sum to the same total as before applying the adjustment.
Modeling on insignificant or uncorrelated attributes does not introduce bias, but it does inflate variance.
Post-Stratification
Post-stratification is employed in survey weighting for several reasons:
There may be measurable sampling errors, such as undercoverage, which can be corrected.
- Incorporating auxiliary information, i.e. the known distribution of the population, into survey estimates should increase accuracy.
- Post-stratified estimates are consistent. Estimates across surveys will match on e.g. the proportion of women in the population if they are all post-stratified according to the same targets.
There are two approaches to this post-stratification: GREG estimation and calibration estimation. Calibration is known under a variety of other names: raking, iterative proportional fitting, and RIM weighting.
Usage
Weighted Estimators
Survey weights w are designed such that a population proportion μ can be calculated using the weighted estimator Σ(wx) / Σw.
In the case that all cases have equal weight, it is straightforward to show that the variance of that estimator is w2σ2.
In any other case, the variance is given by Σ(w2σ2) / (Σw)2. This ratio must then be linearized or simulated to arrive at an approximate variance. Taylor expansion is a common strategy for linearization.
Reading Notes
The Effect of Weight Trimming on Nonlinear Survey Estimates, Frank J. Potter, 1993
Sampling Weights and Regression Analysis, Christopher Winship and Larry Radbill, 1994
Improving on Probability Weighting for Household Size, Andrew Gelman and Thomas C. Little, 1998
Struggles with Survey Weighting and Regression Modeling, Andrew Gelman, 2007
The calibration approach in survey theory and practice, Carl-Erik Särndal, 2007
A single frame multiplicity estimator for multiple frame surveys, Fulvia Mecatti, 2007
Practical Considerations in Raking Survey Data; Michael P Battaglia, David C Hoaglin, and Martin R Frankel (and sometimes David Izrael); 2009
Statistical Paradises and Paradoxes in Big Data, Xiao-Li Meng, 2018
A New Paradigm for Polling, Michael A. Bailey, 2023
The “Law of Large Populations” Does Not Herald a Paradigm Shift in Survey Sampling, Roderick J. Little, 2023
The effect of online interviews on the University of Michigan Survey of Consumer Sentiment, Ryan Cummings and Ernie Tedeschi, 2024
