Why Propensity Scores Should Not Be Used for Matching
Why Propensity Scores Should Not Be Used for Matching (DOI: https://doi.org/10.1017/pan.2019.11) was written by Gary King and Richard Nielsen in 2019. It was published in Political Analysis (vol. 27, no. 4).
The authors critique the matching methods introduced by Rosenbaum and Rubin.
The fundamental problem is that the data generation process in an observational study is unknown, but propensity is modeled anyway. The propensity model used is just one from a set of plausible models, rather than the theoretically sound model. The final analysis is then dependent on the selection of propensity model.
PSM differs from other matching methods in that it approximates a random experiment.
- MDM and CEM approximate a fully blocked random experiment, wherein matched pairs are identified and then the treatment is assigned randomly to one of the pair
- PSM approximates a random experiment, wherein there is not necessarily a match for each treated case
Then, pruning cases based on poor matches (i.e., by distance to nearest match) unwinds the random design. Deletions are essentially random. The expected distance between a treated case and its nearest, untreated neighbor is given by n-1/k where n is the number of untreated cases and k is the number of covariates. Pruning cases lowers n and therefore increases the expected distance.
Furthermore, the propensity model used is sometimes selected on the basis of outcomes in the final analysis, e.g. greatest treatment effect. This is not an unbiased or consistent estimator.
