Andrew Gelman has a really nice post on observational medical research. How could I not respond?
In the post he quotes David Madigan who has a fairly strong opinion on the matter:
I’ve been involved in a large-scale drug safety signal detection project for the last two or three years (http://omop.fnih.org). We have shown empirically that for any given safety issue, by judicious choice of observational database (we looked at 10 big ones), method (we looked at about a dozen), and method setup, you can get *any* answer you want – big positive and highly significant RR or big negative and highly significant RR and everything in between. Generally I don’t think there is any way to say definitively that any one of these analysis is a priori obviously stupid (although “experts” will happily concoct an attack on any approach that does not produce the result they like!). The medical journals are full of conflicting analyses and I’ve come to the belief that, at least in the medical arena, the idea human experts *know* the *right* analysis for a particular estimand is false.
This seems overly harsh to me. Dr. Madigan (who I think is an amazing statistician) is working with OMAP, which I recall as being comprised of data sets of fairly low quality data (prescriptions claims for Medicare/MedicAid, GPRD and other clinical databases, and these sorts of databases). It is a necessary evil to get the power to detect rare (but serious) adverse drug outcomes. But these databases are often problematic when extended beyond extremely clear signal detection issues.
The clearest example of high quality medical data is likely to be randomized controlled double-blinded clinical trials. But there is a whole layer of data between these two extremes of data quality (prospective cohort studies, for example) that has also generated a lot of important findings in medicine.
Sure, it is true that the prospective cohort studies tend to be underpowered to detect rare adverse drug side effects (for precisely the same reason that RCTs are). But there is a lot of interesting observational medical research that does not generate conflicting results or where the experts really seem to have a good grasp on the problem. The links between serum cholesterol levels and cardiovascular events, for example, seems relatively solid and widely replicated. So do the links between smoking and lung cancer (or cardiovascular disease) in North American and European populations. There is a lot that we can learn with observational work.
So I would be careful to generalize to all of medical research.
That being said, I have a great deal of frustration with medical database research for a lot of the same reasons as David Madigan does. I think the issues with trying to do research in medical claims data would be an excellent series of posts as the topic is way too broad for a single post.