Differences between revisions 12 and 20 (spanning 8 versions)
Revision 12 as of 2021-04-30 16:24:06
Size: 5027
Comment:
Revision 20 as of 2025-04-18 20:36:59
Size: 4508
Comment: Clarifications
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Survey weights account for the design of a survey sample and other biases/errors introduced by a survey instrument. '''Survey weights''' account for the [[Statistics/SurveySampling|design of a survey sample]] and [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]].
Line 11: Line 11:
== The Basic Process == == Description ==
Line 13: Line 13:
 1. Set survey dispositions
 2. Calculate '''base weights'''
 3. Apply non-response adjustments to base weights
 4. Calibrate the weights
Survey weights begin with the inverse of the [[Statistics/SurveySampling|sampling probability]]. This is known as the '''base weight'''.
Line 18: Line 15:
See [[SurveyDisposition|here]] for details about survey dispositions. The weight of non-respondents, or more generally anyone who cannot be used for analysis, is reallocated to respondents. This is usually done in a manner that accounts for [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]], especially [[Statistics/NonResponseBias|measurable non-response bias]]. In the simplest case though, if there are no meaningful predictors of response propensity, the weights of non-respondents can be set to 0 and the weights of respondents can be scaled up by a corresponding flat adjustment factor.
Line 20: Line 17:
----



== Base Weights ==

'''Base weights''' incorporate effects from the [[SurveySampling#Sample_Type|sampling design]]. They are the inverse of the probability of being sampled. ''Think '''desired over actual'''.''

With a ''probability sample'', base survey weights account for...

 1. probability of selection
 2. probability of responding and providing enough information to confirm eligibility, given selection
 3. probability of being eligible, given selection and response

With a ''non-probability sample'', these probabilities are all unknown. Often this step is skipped altogether.



=== Examples ===

Some practical for different probability samples:

 * For a ''census'', all respondents have a weight of '''''1'''''.
 * For a ''SRS design'', this is calculated as a simple rate. Given a population of ''20,000'' and a sample size of ''667'', the propbability of being sampled is 20,000/667 = '''''29.99'''''.
 * For a ''STSRS design'', the same process is applied per stratum.

Note that, in each, the sum of base weights should equal the population size.
The final step is [[Statistics/PostStratification|post-stratification]]. This can address [[Statistics/SurveyInference#Sampling_Error|sampling errors]] such as undercoverage. Typically, post-stratification is done by a large set of discrete dimensions such that the true population counts are not known. An algorithm called '''raking''' or '''calibration''' is used to approximate the adjustment.
Line 54: Line 25:
Collected measures should reflect the sample (and therefore the population), and incomplete data creates gaps. Therefore it is necessary to take non-response into account while weighting data. Non-response bias exists when non-response is correlated with a metric of interest, introducing bias into the population estimate.
Line 56: Line 27:
There are two main methods for adjusting weights based on non-response: If non-response is measurable, i.e. response propensity can be predicted using auxiliary information known about the entire sample, then it can also be corrected for.
Line 58: Line 29:
 1. '''Weighting class adjustments''' involve dividing the sample into discrete classes and applying an adjustment factor by class.
 2. '''Propensity score adjustments''' involve calculating the inverse of the estimated probability to respond and applying that as a secondary weight.
A '''weighting class adjustment''' is calculated by using predicted propensity to segment the sample, leading to a response rate per class. Within each class, the inverse of the response rate is the non-response adjustment. Non-respondents have their weight set to 0, as it has been reallocated to respondents that are predicted to be similar in terms of response patterns.
Line 61: Line 31:
A '''propensity score adjustment''' is calculated as the inverse of predicted propensity.
Line 62: Line 33:

=== Examples ===

A simple demonstration breaks the sample into weighting classes based on responsivity, and then reapportions the weight of non-respondents to respondents.

Consider a simple design without eligibility.

||'''Class''' ||'''Count'''||
||Respondent ||800 ||
||Non-respondent||200 ||

To re-apportion the weight of non-respondents, the respondents' weight factors would be adjusted by a factor of (800+200)/800 or 1.25. The non-respondents would then be dropped, or assigned weight factors of 0. ''This is, again, a calculation of '''desired over actual'''.''



=== Non-response Bias ===

Responsivity is commonly related to the key measures of a survey, and therefore introduces '''non-response bias'''. Weighting can account for this error. The core concept is to use auxiliary frame data (i.e. descriptives known for both respondents ''and'' non-respondents).

Adjustments are applied in phases. Cases with unknown eligibility often cannot be adjusted through these methods, and need to be removed. Ineligible cases often are undesirable in analysis datasets, so weights are further adjusted to account for their removal.
Inclusion of insignificant or uncorrelated predictors does not introduce bias in such an adjustment, but it does decrease precision because the variance is increased. As such, when utilizing a linear model for predictions, it is common to use stepwise removal of covariates.
Line 87: Line 39:
== Calibration == == Post-Stratification ==
Line 89: Line 41:
'''Calibration''' forces the measurements to reflect ''known'' descriptives of the population. If the population is ''known'' to be 50% female, then the final estimates of the population should not contradict that fact. Post-stratification is applied because some characteristics of the true population are known, and furthermore are expected to correlate with the metric of interest. By forcing the survey weights to match the known distribution, they are more likely to correct for biases introduced by [[Statistics/SurveyInference#Sampling_Error|sampling errors]]. The population estimates are also more applicable to the true population.
Line 91: Line 43:
Calibration follows from the same basic ideas as above, but involves distinct methods. Weights are often calibrated by many dimensions, requiring a programmed calculation. Methods include: As a result, there are circumstances where post-stratified weights are not applicable. For example, when modeling non-response, the population of interest is in fact the sample, ''not'' the true population.
Line 93: Line 45:
 * post-stratification (i.e. ''desired over actual'')
 * raking
 * linear calibration (GREG)
Post-stratification is often done according to many complex dimensions. For example, the interactions of sex by age [[Statistics/Binning|bins]] (male and 18-24; male and 25-34; and so on). True population counts for the margins of these dimensions are usually available, not not necessarily the cells/intersections. Furthermore, some intersections are likely to have so few respondents that the weights would be inappropriately large.
Line 97: Line 47:
'''Iterative proportional fitting''', more generally known as '''raking''', is an algorithm for post-stratification in such a circumstance. It involves looping over the dimensions, post-stratifying the weights toward those marginal counts one at a time. This small loop is then repeated in a larger loop until a convergence criterion is achieved, or for a pre-determined number of iterations. '''RIM (random iterative method) weighting''' is essentially the same thing.
Line 98: Line 49:
'''Calibration''', or '''GREG (generalized regression) estimation''', is a more generalized algorithm. It utilizes a linear regression model to re-weight towards marginal counts.
Line 99: Line 51:
=== Selection of Calibration Dimensions ===

Quota variables should be selected for calibration, ''especially'' when the quotas involved oversampling of some groups.

If key descriptives of the sample appear imbalanced when compared to a 'gold standard' data source (i.e. the census), then those should also be selected.

Lastly, any descriptives that predict key measures should be selected.



=== Raking ===

'''Raking''', or '''RIM weighting''', involves applying post-stratification by each dimension iteratively, until the weights converge. Convergence is defined as the root mean square (RMS) falling below a threshold, typically 0.000005.

Raked weights generally should not be applied if their efficiency falls below 70%.

----



== Bootstrap Estimation ==

'''Bootstrapping''' is a method for imputing values for missing data. A random sample of equal size is drawn (i.e. cases are resampled ''non-exclusively'' from the original sample). The descriptives in question are modeled using logistic regression on the second sample, and missing values of the first sample are computed based on that model.
In terms of automated convergence criteria, a common choice is to stop when the root mean square (RMS) of the weights themselves falls below a threshold like 0.000005. Another is to stop when the absolute change to the weights themselves falls below a threshold like 0.0001.

Survey Weights

Survey weights account for the design of a survey sample and non-sampling error.


Description

Survey weights begin with the inverse of the sampling probability. This is known as the base weight.

The weight of non-respondents, or more generally anyone who cannot be used for analysis, is reallocated to respondents. This is usually done in a manner that accounts for non-sampling error, especially measurable non-response bias. In the simplest case though, if there are no meaningful predictors of response propensity, the weights of non-respondents can be set to 0 and the weights of respondents can be scaled up by a corresponding flat adjustment factor.

The final step is post-stratification. This can address sampling errors such as undercoverage. Typically, post-stratification is done by a large set of discrete dimensions such that the true population counts are not known. An algorithm called raking or calibration is used to approximate the adjustment.


Non-Response Adjustments

Non-response bias exists when non-response is correlated with a metric of interest, introducing bias into the population estimate.

If non-response is measurable, i.e. response propensity can be predicted using auxiliary information known about the entire sample, then it can also be corrected for.

A weighting class adjustment is calculated by using predicted propensity to segment the sample, leading to a response rate per class. Within each class, the inverse of the response rate is the non-response adjustment. Non-respondents have their weight set to 0, as it has been reallocated to respondents that are predicted to be similar in terms of response patterns.

A propensity score adjustment is calculated as the inverse of predicted propensity.

Inclusion of insignificant or uncorrelated predictors does not introduce bias in such an adjustment, but it does decrease precision because the variance is increased. As such, when utilizing a linear model for predictions, it is common to use stepwise removal of covariates.


Post-Stratification

Post-stratification is applied because some characteristics of the true population are known, and furthermore are expected to correlate with the metric of interest. By forcing the survey weights to match the known distribution, they are more likely to correct for biases introduced by sampling errors. The population estimates are also more applicable to the true population.

As a result, there are circumstances where post-stratified weights are not applicable. For example, when modeling non-response, the population of interest is in fact the sample, not the true population.

Post-stratification is often done according to many complex dimensions. For example, the interactions of sex by age bins (male and 18-24; male and 25-34; and so on). True population counts for the margins of these dimensions are usually available, not not necessarily the cells/intersections. Furthermore, some intersections are likely to have so few respondents that the weights would be inappropriately large.

Iterative proportional fitting, more generally known as raking, is an algorithm for post-stratification in such a circumstance. It involves looping over the dimensions, post-stratifying the weights toward those marginal counts one at a time. This small loop is then repeated in a larger loop until a convergence criterion is achieved, or for a pre-determined number of iterations. RIM (random iterative method) weighting is essentially the same thing.

Calibration, or GREG (generalized regression) estimation, is a more generalized algorithm. It utilizes a linear regression model to re-weight towards marginal counts.

In terms of automated convergence criteria, a common choice is to stop when the root mean square (RMS) of the weights themselves falls below a threshold like 0.000005. Another is to stop when the absolute change to the weights themselves falls below a threshold like 0.0001.


CategoryRicottone

Statistics/SurveyWeights (last edited 2025-04-18 20:36:59 by DominicRicottone)