Monday, April 28, 2014

More on understanding the math but not the statistics

[one of the standard rebuttals to criticisms of popular STEM writing is that certain compromises have to be made when putting things in 'laymen' s terms.' To head off that particular charge, I'm going to use as little technical language as possible in this post.]

Before I post something, I usually do one final search on the subject, just to avoid any surprises. As a result, I often discover better examples than the ones I used in the post. Case in point, after writing a post looking at the pre-538 work of Walt Hickey (and concluding that the editors at 538 appeared to be doing a better job than those at Business Insider), I found this article by Hickey from the Atlantic:

5 Statistics Problems That Will Change The Way You See The World

It was a fairly standard piece (the kind that invariably includes the Monty Hall paradox) and I skimmed through it quickly until the final section which I found myself reading repeatedly to make it actually said what I thought it said:
A kidney study is looking at how well two different drug treatments (A and B) work on small and large kidney stones. Here is the success rate that was found:
Small Stones, Treatment A: 93%, 81 out of 87 trials successful
Small Stones, Treatment B: 87%, 234 out of 270 trials successful
Large Stones, Treatment A: 73%, 192 out of 263 trials successful
Large Stones, Treatment B: 69%, 55 out of 80 trials successful.
Which is the better treatment, A or B?
Even though Treatment A had higher success rates in both small and large stones, when the whole trial is viewed as a sample space Treatment B is actually more successful:
Small Stones, Treatment A: 93%, 81 out of 87 trials successful
Small Stones, Treatment B: 87%, 234 out of 270 trials successful
Large Stones, Treatment A: 73%, 192 out of 263 trials successful
Large Stones, Treatment B: 69%, 55 out of 80 trials successful.
All stones, Treatment A: 78%, 273 of 350 trials successful
All stones, Treatment B: 83%, 289 of 350 trials successful.
This is an excellent example of Simpson's Paradox, where correlation in separate groups doesn't necessarily translate to the whole sample set.
In short, just because there correlation in smaller groups hides the real story taking place in the largest of groups.
This is an almost perfect example of what I mean by understanding the math but not the statistics. The math, though somewhat counterintuitive (as you would expect from a 'paradox'), is straightforward: in certain situations it is possible to have observations of a data set distributed in such a way that, if you cut the set up along certain lines, two variables will have a positive correlation in each subsection but will have a negative correlation when you put them together. It's an interesting result -- cut things one way and you see one thing, cut them another and you see the opposite -- but it doesn't seem particularly meaningful and it certainly doesn't suggest that one view is right and the other is wrong. The result is just ambiguous. ("This is an excellent example of Simpson's Paradox, where correlation in separate groups doesn't necessarily translate to the whole sample set, causing ambiguity.")

When, however, you start thinking not just mathematically but statistically (and more importantly, causally), one view is very much better than the other. Let's look at the kidney stone example again. What we see here is a lot more patients with large stones being given treatment A and a lot more patients with small stones being given treatment B. This is something we see all the time in observational data, more powerful treatments being given to more extreme cases.

This is one of the first things a competent statistician checks for because that relationship we see in the undivided data set is usually covering up the relationship we're looking for. In this case, the difference we see in the partitioned data is probably due to the greater effectiveness of treatment A while the difference we see in the unpartitioned data is almost certainly due to the greater difficulty in treating large kidney stones. Though there are certainly exceptions, statisticians generally combine data when they want larger samples and break it apart when they want a clearer picture.

The version posted at Business Insider with a later timestamp has a different conclusion ("Answer: Treatment A, once you focus on the subsets"). This appears to be a corrected version possibly in response to this comment:
KSC on Nov 13, 12:33 PM said:
After reading the wikipedia article I believe your answer in the Simpson's paradox example is incorrect.
Treatment B is not better. Treatment A is better.
As pointed out in the article Treatment B appears better when looking at the whole sample because the treatments were not randomly assigned to small and large stone cases.
The better treatment (A) tended to be used on the more difficult cases (large stones) and the weaker treatment (B) tended to be used on the simpler cases (small stones).
Even in the corrected version, though, Hickey still closes his badly garbled conclusion with "correlation in smaller groups hides the real story taking place in the largest of groups." Between that and the odd wording of the unacknowledged correction (A is better, period. When we "focus on the subsets," we control for another factor that obscured the results), it seems that Hickey didn't understand his mistake even after having it was explained to him.

Though I've had some rather critical things to say about 538 recently, there's no question that its publisher and editors do understand statistics. These days, that's' enough to put them ahead of the pack.