Using Calibration Weighting to Adjust for Nonresponse and Coverage Errors
Using Calibration Weighting to Adjust for Nonresponse and Coverage Errors was written by Phillip S. Kott in 2006. It was published in Survey Methodology (vol. 32, no. 2).
The authors build on Deville and Särndal (1992) and Folsom and Singh (2000). They introduce a novel calibration procedure and examine its properties under nonresponse and coverage error.
Establishing some notation:
individuals are indexed by k
the universe of individuals is U with size N
the sample of individuals is S
πk is the probability of selection
true population parameter is Ty, and the corresponding survey response item is yk
the indicator function Ik takes the value 1 if an individual is sampled, i.e. k ∈ S, and 0 otherwise
The true population parameter would be calculated as Ty = ΣU yk. An unbiased 'expansion' (design weighted) estimator is ty_E = ΣU ykIk/πk = ΣS yk/πk. (It is unbiased because sample selection is sufficiently random.) We can also define ak = Ik/πk so that ty_E = ΣS akyk.
Some further notation:
P auxiliary variables are known for the universe
individual items are expressed as a row vector: xk = <x1,k ... xP,k>
Emphasis here! This is implicitly a row vector!
true population parameters are also a vector: Tx
Weights wk are calculated such that Tx = ΣU xk = ΣS wkxk (the 'calibration equation'). There is a continuum of such weights, so they should be selected with respect to some loss function and distance from the design weights. It follows that the calibration estimator is ty_CAL = ΣS wkyk. This is also unbiased in the sense that the expected value of random errors is 0.
Some further notation:
ck is a constant which may or may not be based on xk
q is a vector calculated as
Again, recall that xk is implicitly a row vector!
The GREG estimator is ty_GREG = ty_E + q ckakx'k. We require that Λ = limN -> ∞ ckx'kxk/N is a positive definite matrix such that it is invertible. We can also define wk = ak + q ckakx'k = ak(1 + ckx'kq) so that ty_GREG = ΣS wkyk. This is essentially a 'calibration form' of the GREG estimator, demonstrating that they are equivalent. (Or rather, that GREG estimators are an optimized calibration routine.) It can be proven that this estimator is randomization consistent.
By releasing the requirement that calibrated weights should minimize distance from design weights, weights become defined by wk = ak(1 + hkq) where hk is another (implicitly row) vector with the same dimension as xk. (We still have a parallel requirement that the matrix calculated as ΣS akh'kxk be invertible.) Some further notes:
basically hk replaces ckxk
any components of hk that are not linear combinations of xk are instrumental variables
it follows that q is now calculated as
- for all the same reasons as the GREG estimator, this calibration estimator is randomization consistent
The author further generalizes the calibration estimator using wk_GEN = ak f(hkq*) where:
f is a function with the following properties:
- monotonic
twice-differentiable
f(0) = 1
f'(0) = 1
q* is chosen to satisfy the calibration equation
- can usually be derived iteratives, i.e. start with a zero vector, determine the error compared to known population parameters, add some portion of that error to the working vector, loop until converges
This is generalized because f can be nonlinear. Consider f(hkq) = exp(xkq) when all xk are binary (0 or 1).
Now the author explores the use of this framework for nonresponse weighting. If the possibility of non-sampling error is considered, then f(0) and f'(0) are not constrained to be 1. The logistic function is an option. Each individual has an independent probability of responding as pk = 1/f(hkφ).
Calibration can also correct under-coverage errors. This is an extension of simple post-stratification.
Reading Notes
I get lost in section 5, which discusses asymptotic properties to estimate variance. The parts of section 6 which discuss quasi-randomization I also think I'm not 100% following.
