Frances Woolley has a post on the use of the inverse hyperbolic sine transformation for handling wealth as a variable (skewed and with lots of zeros).
The post is worth reading and the comments are really interesting. In particular, Chris Auld makes a very good case for simplicity and interpretability as a desirable property of statistical models in several of the comments.
There is also a thought provoking discussion of how to parameterize wealth that involves the sort of deep thinking about variables that we should do more of in epidemiology. In particular, in what sense is it reasonable to consider a person (especially in a country like Canada with strong entitlement programs) to truly have zero wealth.
Definitely worth the read.
Interesting discussion but, in her presentation of the transformation, Woolley forgot to include the scaling factor. (She just implicitly set it to 1, which can't be correct in general.) This might sound obvious, but I've seen people work with log (1+income) without ever reflecting on the fact that this model is different if income is measured in dollars, hundreds of dollars, or thousands of dollars.ReplyDelete
Good point. I tend to set income to dollars (to make the +1 negligible) but clearly this is a good point. You also get coefficients that are multiplicative which is usually another undesirable property in my field.ReplyDelete
Yeah, this scaling parameter bothers me! With log(x + s), can't your model results be *extremely* sensitive to s? In the limit as s goes to zero, I think the differences among numbers that started out non-zero become overwhelmed by the huge difference between those and the rest?ReplyDelete
It looks like this other transform has the same property?