Monday, March 24, 2014

This is also true in Epidemiology

Frances Woolley is back with a great post on how junior people focus on the statistical models and not the data set itself.  This is unfortunate as domain-specific knowledge of the data and the expected relations in the data is often the most important contributions.  When I worry about "field-jumping", it is this sort of problem that jumps up:
But all else is not equal. Using probit will not save a regression that combines men and women together into one sample when estimating the impact of having young children on the probability of being employed, and fails to include a gender*children interaction term. (The problem here is that children are associated with a higher probability of being employed for men, and a lower probability of being employed for women. These two effects cancel out in a sample that includes both men and women.)
Here we have a well understood and theoretically clear interaction that could easily be missed if one was not aware the body of work under-pinning it. 

It's also why I am suspicious of simplistic explanations for why entire fields have missed the obvious confounder/true exposure.  It is possible that this is true, but a command of the literature is needed to really understand why such a blind spot developed.  Which is not to say outsiders never bring in value (the Emperor has no clothes effect really exists).  But that I am much happier when I see a very detailed command of the data being used, the questions that were asked, the population that was included, ways in which the data collection may have influenced the results, and so forth.

Definitely go and read.

No comments:

Post a Comment