Showing posts with label Statistical Models. Show all posts
Showing posts with label Statistical Models. Show all posts

Wednesday, July 31, 2013

General versus particular cases

Andrew Gelman did a very interesting article in Slate on how being overly reliant on statistical significance can lead to spurious findings.  The authors of the study that he was critiquing replied to his piece.  Andrew's thoughts on the response are here

The led to two thoughts.  One, I am completely unimpressed by claims that a paper being in a peer-reviewed journal -- that is a screen but even good test have false positives.  All this convinces me of is that the authors were thoughtful in the development of the article, not that they are immune to problems.  But this is true of all papers, including mine. 

Two, I think that this is a very tough area to take a single example from.  The reason is that any one paper could well have followed the highest possible level of rigor, as Jessica Tracy and Alec Beall claim they have done.  That doesn't necessarily mean that all studies in the class have followed these practices or that there were not filters that aided or impeded publication that might enhance the risk of a false positive.

For example, I have just finished publishing a paper where I had an unexpected finding that I wanted to replicate (that there was an association was a priori, the direction was reversed from the a priori hypothesis).  I found such a study, added additional authors, added additional analysis, rewrote the paper to be a careful combination of two different cohorts, and redid the discussion.  Guess what, the finding did not replicate.  So then I had  the special gift of publishing a null paper with a lot of authors and some potentially confusing associations.  If I had just given up at that point, the question might have been hanging around until somebody else found the same thing (I often used widely available data in my research) and published it. 

So I would be cautious about multiplying the p-values together for a probability of a false positive.  Jessica Tracy and Alec Beall:
The chance of obtaining the same significant effect across two independent consecutive studies is .0025 (Murayama, K., Pekrun, R., & Fiedler, K. (in press). Research practices that can prevent an inflation of false-positive rates. Personality and Social Psychology Review.)
I suspect that this would only hold if the testable hypothesis was clearly stated before either study was done.  It also presumes independence (it is not always obvious that this will hold as design elements of studies may influence each other) and that there isn't a confounding factor involved (that is causing both the exposure and the outcome).

Furthermore, I think as epidemiologists we need to make a decision about whether these studies are making strong causal claims or advancing a prospective association that may led to a better understanding of a disease state.  We often write articles speaking in the later mode but then lapse into the former when being quoted. 

So I guess I am writing a lot to say a couple of things in conclusion. 

One, it is very hard to pick a specific example of a general problem when it is possible that any one example might happen to meet the standards required for the depth of inference being made.  This is very hard to ascertain within the standards of the literature. 

Two, the decision of what to study and what to publish are also pretty important steps in the process.  These things can have a powerful influence on the direction of science in a very hard to detect manner. 

So I want to thank Andrew Gelman for starting this conversation and the authors of the paper in question for acting as an example in this tough dialogue. 



Monday, March 11, 2013

Some epidemiology for a change

John Cook has an interesting point:
When you reject a data point as an outlier, you’re saying that the point is unlikely to occur again, despite the fact that you’ve already seen it. This puts you in the curious position of believing that some values you have not seen are more likely than one of the values you have in fact seen.
 
This is especially problematic in the case of rare but important outcomes and it can be very hard to decide what to do in these cases.  Imagine a randomized controlled trial for the effectiveness of a new medication for a rare disease (maybe something memory improvement in older adults).  One of the treated participants experiences sudden cardiac death whereas nobody in the placebo group does. 

One one hand, if the sudden cardiac death had occured in the placebo group, we would be extremely reluctant to advance this as evidence that the medication in question prevents death.  On the other hand, rare but serious drug adverse events both exist and can do a great deal of damage.  The true but trivial answer is "get more data points".  Obviously, if this is a feasible option it should be pursued. 

But these questions get really tricky when there is simply a dearth of data.  Under these circumstances, I do not think that any statistical approach (frequentist, Bayesian or other) is going to give consistently useful answers, as we don't know if the outlier is a mistake (a recording error, for example) or if it is the most important feature of the data.

It's not a fun problem. 

Tuesday, July 12, 2011

Modeling assumptions

From Matt Yglesias:

I’ll note, however, that you might be a freshwater economist if you think it makes sense to reassure us that a deflationary spiral is impossible because your model says so even though deflationary spirals do, in fact, occur in human history. To me, a model that denies the possibility of something happening that does, in fact, happen indicates that you’re working with a flawed model.


I can't comment on whether or not this is a fair assessment of the work in question. But it is always a good idea to "reality check" model outputs and ensure that the distribution of events generated by the model looks something like real data. If important events occur in real data that your model dismisses as impossible than model misspecification or missing confounding variables should be immediately suspected.


EDIT: Noah Smith also comments and it is well worth the read. He traces these conclusions to some rather strong assumptions . . .

Friday, July 8, 2011

Case Crossover paper and time trends

There was a new paper e-published recently in Pharmacoepidmeiology and Drug safety that used the case-crossover study design:

"Purpose
Elevated levels of phosphorus (P) and calcium (Ca) have been shown in observational studies to be associated with an increased risk of adverse clinical outcomes including mortality. Vitamin D sterols have been shown to increase the risk of hypercalcemia and hyperphosphatemia in clinical trials. We sought to explore these risks in real-world clinical practice.
Methods
We employed a case–crossover design, which eliminates confounding by non-time-varying patient characteristics by comparing, within each patient, vitamin D doses before the event with those at an earlier period. Using this method, we estimated the risk of hypercalcemic (Ca ≥ 11 g/dL) and hyperphosphatemic (P ≥ 8 g/dL) events for patients at different dose quartiles of vitamin D relative to patients not on a vitamin D sterol.
Results
There was a dose-dependent association between vitamin D dose quartile and risk of hypercalcemia or hyperphosphatemia. In adjusted analyses, each increase in vitamin D quartile was associated with a multiple of hypercalcemia risk between 1.7 and 19 times compared with those not on vitamin D and a multiple of hyperphosphatemia risk between 1.8 and 4.
Conclusion
Use of vitamin D sterols is associated with an increased risk of hypercalcemic and hyperphosphatemic events in real-world clinical practice. Other potential predictors of these events, such as phosphate binder use and dialysate Ca levels, were not examined in this analysis."

It seems to be an interesting paper but I have one concern. If you look at the discussion section of the paper, the authors note that:

In our sensitivity analysis, we used 1-month periods to assess vitamin D exposure. In this analysis, estimates of the association between vitamin D dose and risk of events were smaller than those in the primary analysis, particularly for hypercalcemia. One possible explanation for this finding is that the average 2-month exposure measure is a superior indicator, compared with the 1-month assessment, of both the dose and duration of vitamin D exposure. As well, it could be that some dose changes in the month prior to the event had already occurred in response to increasing Ca levels and that, for this reason, the dose 2 months prior to the event is a more accurate reflection of the dose that gave rise to the hypercalcemic or hyperphosphatemic event.


Another explanation that I did not see addressed is the possibility that there is a time trend occuring. If the frequency of vitamin D administration (or the dose) increased with time then you would expect to see smaller estimates in the sensitivity analysis as well. But it would be an artefact of changing exposure over time.

That being said, it was a real pleasure to read a well justified use of the case-crossover design in a medication paper. Hopefully this is a sign that there will be more use of with-in person study designs in the future in epidemiology. The ability to handle time invariant confounders is a serious advantage of this approach.

Thursday, July 7, 2011

Transformations

Frances Woolley has a post on the use of the inverse hyperbolic sine transformation for handling wealth as a variable (skewed and with lots of zeros).

The post is worth reading and the comments are really interesting. In particular, Chris Auld makes a very good case for simplicity and interpretability as a desirable property of statistical models in several of the comments.

There is also a thought provoking discussion of how to parameterize wealth that involves the sort of deep thinking about variables that we should do more of in epidemiology. In particular, in what sense is it reasonable to consider a person (especially in a country like Canada with strong entitlement programs) to truly have zero wealth.

Definitely worth the read.

Thursday, April 14, 2011

Prediction is hard

President George W. Bush in 2001:

Many of you have talked about the need to pay down our national debt. I listened, and I agree. We owe it to our children and our grandchildren to act now, and I hope you will join me to pay down $2 trillion in debt during the next 10 years. At the end of those 10 years, we will have paid down all the debt that is available to retire. That is more debt repaid more quickly than has ever been repaid by any nation at any time in history.


I think that the core issue here, presuming good faith on all sides, is that second order effects are really hard to model. So tax cuts (both business and individual cuts) are seen to stimulate the economy. But accurately predicting that is very hard in a large and non-linear system like the United States economy. It's even possible that tax cuts could have perverse effects of lowering growth (I am not saying that they do -- it's just that complex, non-linear systems which are sensitive to initial values are very hard to predict).

So perhaps the real lesson here is to focus on first order effects. Link tax cuts directly to program cuts. And vice versa, new programs should have taxes that are linked to them. In my world, that would include wars (notice how World Wars I and II led to large tax increases to finance) which would make the debate about military intervention a lot more involved. I don't know if this would be a complete solution to deficit woes, but I worry that the current approach relies way too heavily on statistical models to predict the consequences of tax and budget policy (and, as we know from chaos theory, these types of models are notoriously difficult to use for prediction).

Monday, August 30, 2010

Sentences to ponder

We always talk about a model being "useful" but the concept is hard to quantify.


-- Andrew Gelman

This really does match my experience. We talk about the idea that "all models are wrong but some models are useful" all of the time in Epidemiology. But it's rather tricky to actually define this quantity of "useful" in a rigorous way.