Thursday, July 15, 2010

Bias versus precision

In epidemiology, we are typically trying to estimate an unbiased measure of associations between an exposure and an outcome. Generally, we punt on the causal question of "does exposure X cause outcome Y?", but it is inevitably in the background. After all, if we say that poor exercise habits are associated with early mortality it is generally taken as an advisory to consider improving one's exercise habits rather than as an interesting coincidence.

But not all models are confounding models and the instincts that serve us so well for confounding models can be misleading for predictive models. Nate Silver has a very well explained example of how inaccurate (or, to be more formal, imprecise) predictive models can be worse than biased models. It's a very interesting confusion between bias and precision but it makes me wonder if we don't focus too much on unbiased and too little on efficiency for some of our models.



  1. Mark Twain (or maybe Lewis Carroll) once observed that a clock that ran a few minutes late was never right while a stopped clock was right twice a day. It also has the advantage of being completely unbiased.

  2. Too true.

    But it was a rather odd comment for a predictive modeler to make and Nate Silver did a rather wonderful job of clearly explaining the error. I may steal his work this fall for my data analysis class.

  3. "Rather odd" is a mild way of putting it. With all due respect to Nate Silver and company (and they are very good), I think their relative performance says as much about the state of statistics in the polling industry as it does about 538.