Don’t model the probability of win, model the expected score differential. Yeah, I know, I know, what you really want to know is who wins. But the most efficient way to get there is to model the score differential and then map that back to win probabilities. The exact same issue comes up in election modeling: it makes sense to predict vote differential and then map that to Pr(win), rather than predicting Pr(win) directly. This is most obvious in very close games (or elections) or blowouts; in either of these settings the win/loss outcome provides essentially zero information. But it’s true more generally that there’s a lot of information in the score (or vote) differential that’s thrown away if you just look at win/loss.This is the same principle in a lot of medical problems. There is often a tendency to define diseases based on continuous distributions as binary outcomes. Consider:
- High blood pressure = hypertension
- High cholesterol (especially LDL) and/or low cholesterol (HDL) = dyslipidemia
- High blood glucose = diabetes
But I think that you will see much better prediction if you first model change in the parameter (e.g. blood pressure) and then convert that to the binary disease state (e.g. hypertension) then if you just develop a logistic model for prob(hypertension).
No comments:
Post a Comment