A simulation study of cell collapsing in poststratification

A simulation study of cell collapsing in poststratification was written by Jay J. Kim, Linda Tompkins, Jianzhu Li, and Richard Valliant in 2005. It was published as part of the proceedings of the American Statistical Association Section on Survey Methods Research (ASA-SSMR).

The authors discuss the tradeoffs between bias and variance when collapsing cells in raking procedures. Disparate adjustments inflate the variance of estimates, but forcing adjustments to be equivalent prevents coverage correction.

Through some simple assumptions (chiefly that pre-calibration weights are constant, as opposed to varying across samples), they demonstrate that collapsing raking cells does not introduce design bias two criteria are met:

  1. The population and sample means must be equal within each collapsed cell.
  2. Either the sample means must be equal across the cells that are collapsed together, or the coverage ratio must be equal.

They notate an un-collapsed raking procedure as PS1 and a simple collapsed raking procedure as PS2. They now propose two compromise procedures. PS.WR1 retains more cell-specific coverage correction than a simple collapsed raking procedure. It is calculated as:

  1. calculate that initial adjustment factor (IAF) for each cell (i.e. N/n)

  2. using a pre-determined threshold of the IAF, identify sparse cells
  3. for each sparse cell, collapse into the neighbor cell with the lowest IAF
  4. adjust the weights of units (wk) from sparse cells by the threshold value (i.e., given a threshold of 2, k = wk2)

PS.WR2 is a slight modification:

  1. for each sparse cell, if the neighbor with the lowest IAF is in a collapsed group, collapse into that group
  2. if a group contains only sparse cells, treat as a simple collapsed raking procedure
  3. if a group contains at least one non-sparse cell...
    1. adjust the weights of sparse cell units by the threshold value
    2. adjust the weights of non-sparse cell units by (group population control - sum of sparse cells' weights)/sum of non-sparse cells' weights

Consider two cells i as

Cell

Population control (Ni)

Sum of initial weights (ni)

Initial adjustment factor (fi)

1

200

50

4

2

150

150

1

If the adjustments are used as-is (i.e. 4 and 1 respectively), the variance of weighted estimates will be substantially inflated.

Collapsing the two cells leads to a IAF of 1.75 for both. However, this leads to estimated population totals of 88 and 263 respectively, which is substantially off of the known totals.

Implementing PS.WR1 with a threshold of 2 leads to adjustment factors of 2.8 and 1.4 respectively, leading to estimated population totals of 140 and 210.

Implementing PS.WR2 with a threshold of 2 leads to adjustment factors of 2 and 1.667 respectively, leading to estimated population totals of 100 and 200. While these adjustments are closer than for PS.WR1, they do not sum to the overall population control.

The authors validate these compromise procedures using the 2003 National Health Interview Survey (NHIS) public use files. They draw a stratified random sample of 21,664 cases and calibrate the simulated subsample to the original sample by age (8 levels) and gender (2 levels; 16 raking cells total). They implement raking procedures PS1, PS2, PS.WR1, and PS.WR2; they also calculate unweighted estimates to demonstrate the bias reduction of coverage corrections. They find...


CategoryRicottone CategoryReadingNotes

ASimulationStudyOfCellCollapsingInPoststratification (last edited 2025-09-05 20:42:54 by DominicRicottone)