Sampling Weights and Regression Analysis
Sampling Weights and Regression Analysis (DOI: https://doi.org/10.1177/0049124194023002004) was written by Christopher Winship and Larry Radbill in 1994. It was published in Sociological Methods & Research (vol. 23, no. 2).
If weights are computed as a function of predictors, and if those predictors are also used as independent variables in a model, then the model can be estimated correctly with or without weights. Importantly though, the unweighted model will also estimate the coefficients with smaller standard errors. This does assume that the model is correctly specified. If there is a nonlinear relation between Y and X, then estimating Y using survey weights designed with respect to X is not the same as linearly modeling Y on X.
If weights are a function of predictors including a model's dependent variable, then White heteroskedastic standard errors must be used.
The authors demonstrate these with Monte Carlo methods. Random samples are simulated using the GSS 1974-1984 cumulative data file. OLS and weighted OLS are used to estimate a model for each sample, and the estimated coefficients and variances are compared to the true parameters used in generation.
The authors also document persistent errors in how statistical software is used. They note that e.g. SPSS incorrectly uses the sum of weights as the sample size in regression routines, whereas e.g. Stata and SAS correctly use the actual sample size. (The SPSS method would only be appropriate if using frequency weights, i.e. the dataset is compressed by deleting duplicative records and moving their weight to a single representative record.)