Propensity-score matching is increasingly being used to estimate the effects of treatments using observational data. In many-to-one (M:1) matching on the propensity score, M untreated subjects are matched to each treated subject using the propensity score. The authors used Monte Carlo simulations to examine the effect of the choice of M on the statistical performance of matched estimators. They considered matching 1–5 untreated subjects to each treated subject using both nearest-neighbor matching and caliper matching in 96 different scenarios. Increasing the number of untreated subjects matched to each treated subject tended to increase the bias in the estimated treatment effect; conversely, increasing the number of untreated subjects matched to each treated subject decreased the sampling variability of the estimated treatment effect. Using nearest-neighbor matching, the mean squared error of the estimated treatment effect was minimized in 67.7% of the scenarios when 1:1 matching was used. Using nearest-neighbor matching or caliper matching, the mean squared error was minimized in approximately 84% of the scenarios when, at most, 2 untreated subjects were matched to each treated subject. The authors recommend that, in most settings, researchers match either 1 or 2 untreated subjects to each treated subject when using propensity-score matching.

This result is quite interesting. It's intuitive if you think about it for a bit (the closet matches will be the best possible controls) but it varies from the wisdom of case control studies a lot (always use between 4 and 20 controls per case, if possible, so that the size of the confidence intervals is dependent on the cases).

I think that there are two things that need to be considered. Peter Austin works with ICES which uses prescriptions claims from the province of Ontario. So the types of study that he works with are typically large (and even his small samples were 500 cases). So variance is low, anyway, and a focus on bias makes perfect sense.

Second, complex propensity scores (based on many variables) are rarely the same for any two participants whereas the matching in case control studies is often on factors (age, sex) that can be perfectly matched.

So it is a useful and interesting result. What I really want to know, having never managed to get AJE to accept a paper from me at all, is how he managed this feat:

Received April 21, 2010

Accepted June 18, 2010

Impressive!

So do you think the result would not be valid when matching on more complex sets of criteria? I am looking at using propensity scores in my dissertation using a decent sized data set (16,000 total) and will be matching both on obvious factors (sex, race/ethnicity) and scores built from a combination of other elements. I had assumed that more matches was better; does your read of the paper suggest otherwise?

ReplyDelete(I'm still waiting for a paper that was accepted in 2008 to be published, so those dates strike me as miraculous....)

I think that with large data sets (like what Peter Austin is considering in this paper) that these results make a solid case for one or two matches instead of lots for propensity score matching. You definitely lose precision but large data sets are already very precise. So, it might depend on other details off the analysis, but I would be inclined to use fewer matches.

ReplyDeleteThe publication speed was uncanny. Somebody in the editorial office was very impressed with this paper!