This article on p-values was a very interesting read. The author (who teaches statistics) has a very nice discussion of p-values:
One reason for this, I think, is that we fail to teach well how, with enough data, any non-zero parameter or difference becomes statistically significant at arbitrarily small levels. The proverbial expression of this, due I believe to Andy Gelman, is that "the p-value is a measure of sample size". More exactly, a p-value generally runs together the size of the parameter, how well we can estimate the parameter, and the sample size. The p-value reflects how much information the data has about the parameter, and we can think of "information" as the product of sample size and precision (in the sense of inverse variance) of estimation, say n/σ2. In some cases, this heuristic is actually exactly right, and what I just called "information" really is the Fisher information.But I found this way of talking about p-values to be extremely useful, and something that should be kept in mind in Epidemiology -- where a significant association estimates from a big sample with a small effect can often be uninteresting. You never reduce bias to zero in a real observational study and interventions rarely remove an association entirely (as not everyone changes behavior or mitigation is partial). In the era of big data, this becomes important.
Fun stuff.
No comments:
Post a Comment