West Coast Stat Views (on Observational Epidemiology and more): Advice from Andrew Gelman

Thursday, March 25, 2010

Advice from Andrew Gelman

Whom I always defer to on non-literary matters:

They also recommend composite end points (see page 418 of the above-linked article), which is a point that Jennifer and I emphasize in chapter 4 of our book and which comes up all the time, over and over in my applied research and consulting. If I had to come up with one statistical tip that would be most useful to you--that is, good advice that's easy to apply and which you might not already know--it would be to use transformations. Log, square-root, etc.--yes, all that, but more! I'm talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something continuous (those "total scores" that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don't care if the threshold is "clinically relevant" or whatever--just don't do it. If you gotta discretize, for Christ's sake break the variable into 3 categories.

This all seems quite obvious but people don't know about it. What gives? I have a theory, which goes like this. People are trained to run regressions "out of the box," not touching their data at all. Why? For two reasons:
1. Touching your data before analysis seems like cheating. If you do your analysis blind (perhaps not even hanging your variable names or converting them from ALL CAPS), then you can't cheat.
2. In classical (non-Bayesian) statistics, linear transformations on the predictors have no effect on inferences for linear regression or generalized linear models. When you're learning applied statistics from a classical perspective, transformations tend to get downplayed, and they are considered as little more than tricks to approximate a normal error term (and the error term, as we discuss in our book, is generally the least important part of a model).Once you take a Bayesian approach, however, and think of your coefficients as not being mathematical abstractions but actually having some meaning, you move naturally into model building and transformations.

I don't know if I entirely buy point 2. I'm generally a frequentist and I make extensive use of transformations (though none of them are linear transformations).

West Coast Stat Views (on Observational Epidemiology and more)

Thursday, March 25, 2010

Advice from Andrew Gelman

No comments:

Post a Comment