= Using Calibration Weighting to Adjust for Nonresponse and Coverage Errors =

'''Using Calibration Weighting to Adjust for Nonresponse and Coverage Errors''' was written by Phillip S. Kott in 2006. It was published in ''Survey Methodology'' (vol. 32, no. 2).

The authors build on [[CalibrationEstimatorsInSurveySampling|Deville and Särndal (1992)]] and [[TheGeneralizedExponentialModelForSamplingWeightCalibrationForExtremeValuesNonresponseAndPostStratification|Folsom and Singh (2000)]]. They introduce a novel calibration procedure and examine its properties under [[Statistics/SurveyNonresponse|nonresponse]] and [[Statistics/SurveyInference#Sampling_Error|coverage error]].

Establishing some notation:
 * individuals are indexed by ''k''
 * the universe of individuals is ''U'' with size ''N''
 * the sample of individuals is ''S''
 * ''π,,k,,'' is the probability of selection
 * true population parameter is ''T,,y,,'', and the corresponding survey response item is ''y,,k,,''
 * the indicator function ''I,,k,,'' takes the value 1 if an individual is sampled, i.e. ''k ∈ S'', and 0 otherwise

The true population parameter would be calculated as ''T,,y,, = Σ,,U,, y,,k,,''. An unbiased 'expansion' ([[Statistics/DesignWeights|design weighted]]) estimator is ''t,,y_E,, = Σ,,U,, y,,k,,I,,k,,/π,,k,, = Σ,,S,, y,,k,,/π,,k,,''. (It is unbiased because sample selection is sufficiently random.) We can also define ''a,,k,, = I,,k,,/π,,k,,'' so that ''t,,y_E,, = Σ,,S,, a,,k,,y,,k,,''.

Some further notation:
 * ''P'' auxiliary variables are known for the universe
 * individual items are expressed as a row vector: ''x,,k,, = <x,,1,k,, ... x,,P,k,,>''
   * Emphasis here! This is implicitly a ''row'' vector!
 * true population parameters are also a vector: ''T,,x,,''

Weights ''w,,k,,'' are calculated such that ''T,,x,, = Σ,,U,, x,,k,, = Σ,,S,, w,,k,,x,,k,,'' (the 'calibration equation'). There is a continuum of such weights, so they should be selected with respect to some loss function and distance from the design weights. It follows that the [[Statistics/Calibration|calibration]] estimator is ''t,,y_CAL,, = Σ,,S,, w,,k,,y,,k,,''. This is also unbiased in the sense that the expected value of random errors is 0.

Some further notation:
 * ''c,,k,,'' is a constant which may or may not be based on ''x,,k,,''
 * ''q'' is a vector calculated as {{attachment:q.svg}}
 * Again, recall that ''x,,k,,'' is implicitly a ''row'' vector!

The [[Statistics/GeneralizedRegressionEstimator|GREG estimator]] is ''t,,y_GREG,, = t,,y_E,, + q c,,k,,a,,k,,x',,k,,''. We require that '''''Λ''' = lim,,N -> ∞,, c,,k,,x',,k,,x,,k,,/N'' is a [[LinearAlgebra/PositiveDefiniteness|positive definite]] matrix such that it is [[LinearAlgebra/Invertibility|invertible]]. We can also define ''w,,k,, = a,,k,, + q c,,k,,a,,k,,x',,k,, = a,,k,,(1 + c,,k,,x',,k,,q)'' so that ''t,,y_GREG,, = Σ,,S,, w,,k,,y,,k,,''. This is essentially a 'calibration form' of the GREG estimator, demonstrating that they are equivalent. (Or rather, that GREG estimators are an optimized calibration routine.) It can be proven that this estimator is randomization consistent.

By releasing the requirement that calibrated weights should minimize distance from design weights, weights become defined by ''w,,k,, = a,,k,,(1 + h,,k,,q)'' where ''h,,k,,'' is another (implicitly row) vector with the same dimension as ''x,,k,,''. (We still have a parallel requirement that the matrix calculated as ''Σ,,S,, a,,k,,h',,k,,x,,k,,'' be invertible.) Some further notes:
 * basically ''h,,k,,'' replaces ''c,,k,,x,,k,,''
 * any components of ''h,,k,,'' that are not linear combinations of ''x,,k,,'' are [[Statistics/InstrumentalVariablesMethod|instrumental variables]]
 * it follows that ''q'' is now calculated as {{attachment:q2.svg}}
 * for all the same reasons as the GREG estimator, this calibration estimator is randomization consistent

The author further generalizes the calibration estimator using ''w,,k_GEN,, = a,,k,, f(h,,k,,q*)'' where:
 * ''f'' is a function with the following properties:
   * monotonic
   * twice-[[Calculus/Derivative|differentiable]]
   * ''f(0) = 1''
   * ''f'(0) = 1''
 * ''q*'' is chosen to satisfy the calibration equation
   * can usually be derived iteratives, i.e. start with a zero vector, determine the error compared to known population parameters, add some portion of that error to the working vector, loop until converges

This is generalized because ''f'' can be nonlinear. Consider ''f(h,,k,,q) = exp(x,,k,,q)'' when all ''x,,k,,'' are binary (0 or 1).

Now the author explores the use of this framework for nonresponse weighting. If the possibility of [[Statistics/SurveyInference#Non-sampling_Error|non-sampling error]] is considered, then ''f(0)'' and ''f'(0)'' are not constrained to be 1. The [[Statistics/LogisticModel|logistic]] function is an option. Each individual has an independent probability of responding as ''p,,k,, = 1/f(h,,k,,φ)''.

Calibration can also correct under-coverage errors. This is an extension of simple [[Statistics/PostStratification|post-stratification]].



=== Reading Notes ===

I get lost in section 5, which discusses asymptotic properties to estimate variance. The parts of section 6 which discuss quasi-randomization I also think I'm not 100% following.



----
CategoryRicottone CategoryReadingNotes CategoryTodoRead