Propensity-score matching is increasingly being used to estimate the effects of treatments using observational data. In many-to-one (M:1) matching on the propensity score, M untreated subjects are matched to each treated subject using the propensity score. The authors used Monte Carlo simulations to examine the effect of the choice of M on the statistical performance of matched estimators. They considered matching 1–5 untreated subjects to each treated subject using both nearest-neighbor matching and caliper matching in 96 different scenarios. Increasing the number of untreated subjects matched to each treated subject tended to increase the bias in the estimated treatment effect; conversely, increasing the number of untreated subjects matched to each treated subject decreased the sampling variability of the estimated treatment effect. Using nearest-neighbor matching, the mean squared error of the estimated treatment effect was minimized in 67.7% of the scenarios when 1:1 matching was used. Using nearest-neighbor matching or caliper matching, the mean squared error was minimized in approximately 84% of the scenarios when, at most, 2 untreated subjects were matched to each treated subject. The authors recommend that, in most settings, researchers match either 1 or 2 untreated subjects to each treated subject when using propensity-score matching.
This result is quite interesting. It's intuitive if you think about it for a bit (the closet matches will be the best possible controls) but it varies from the wisdom of case control studies a lot (always use between 4 and 20 controls per case, if possible, so that the size of the confidence intervals is dependent on the cases).
I think that there are two things that need to be considered. Peter Austin works with ICES which uses prescriptions claims from the province of Ontario. So the types of study that he works with are typically large (and even his small samples were 500 cases). So variance is low, anyway, and a focus on bias makes perfect sense.
Second, complex propensity scores (based on many variables) are rarely the same for any two participants whereas the matching in case control studies is often on factors (age, sex) that can be perfectly matched.
So it is a useful and interesting result. What I really want to know, having never managed to get AJE to accept a paper from me at all, is how he managed this feat:
Received April 21, 2010
Accepted June 18, 2010