Saturday, February 27, 2010

Meta-Freakonomics

Joseph recently wrote a post referring to this post by Andrew Gelman (which was based on a series of posts by Kaiser Fung which check the veracity of various claims in Superfreakonomics -- welcome to the convoluted world of the blogosphere). Joseph uses Dr. Gelman's comments about the poor editing and fact-checking of the book to make a point about the disparity between the contribution editing makes and how little we reward it. He ought to know; I have frequently taken advantage of his good nature in this area, but at the risk of being ungrateful, I don't think the point applies here. Rather than being helpful, the kind of criticism Joseph and Gelman describe could only hurt Superfreakonomics.

Or put another way, if we approach this using the techniques and assumptions of the Freakonomics books, we can show that by foregoing a rigorous internal review process the authors were simply acting rationally.

Before we get to the actual argument, we need to address one more point in Joseph's post. Joseph says that providing critical read "is one of the most helpful things a colleague can do for you, yet one of the least rewarded." This statement is absolutely true for easily 99.9% of the books and manuscripts out there. It is not, however, true for the Freakonomics books. Between their prestige and the deep pockets of William Morrow, Levitt and Dubner could have gotten as many highly-qualified internal reviewers as they wanted, reviewers who would have been compensated with both an acknowledgment and a nice check. (Hell, they might even get to be in the movie.)

But if the cost and difficulty of putting together an all-star team of reviewers for Superfreakonomics would have been negligible, how about the benefits? Consider the example of its highly successful predecessor. Freakonomics was so badly vetted that two sections (including the book's centerpiece on abortion) were debunked almost immediately. The source material for the KKK section was so flawed that even Levitt and Dubner disavowed it.

These flaws could have been caught and addressed in the editing process but how would making those corrections help the authors? Do we have any reason to believe that questionable facts and sloppy reasoning cost Levitt and Dubner significant book sales (the book sold over four million copies)? That they endangered the authors' spot with the New York Times? Reduced in any way the pervasive influence the book holds over the next generation of economists? Where would Levitt and Dubner have benefited from a series of tough internal reviews?

Against these elusive benefits we have a number of not-so-hard-to-find costs. While the time and money required to spot flaws is relatively minor, the effort required to address those flaws can be substantial.

Let's look at some specifics. Kaiser Fung raises a number of questions about the statistics in the "sex" chapter (the one about female longevity is particularly damning) and I'm sure he overlooked some -- not because there was anything wrong with his critique but because finding and interpreting reliable data on a century of sex and prostitution is extraordinarily difficult. It involves measurement covert behavior that can be affected by zoning, police procedures, city politics, shifts in organized crime,and countless other factors. Furthermore these same factors can bias the collection of data in nasty and unpredictable ways.

Even if all of the sex chapter's underlying economics arguments were sound (which they are, as far as I know), there would still have been a very good chance that some reviewer might have pointed out flawed data, discredited studies, or turned up findings from more credible sources that undercut the main hypotheses. That doesn't mean that the chapter couldn't be saved -- a good team of researchers with enough time could probably find solid data to support the arguments (assuming, once again, that they were sound) but the final result would be a chapter that would look about the same to the vast majority of readers and external reviewers -- all cost, no benefit.

Worse yet, think about the section on the relative dangers of drunken driving vs. drunken walking. These cute little counter-intuitive analyses are the signature pieces of Levitt and Dubner (and were associated with Dr. Levitt before he formed the team). They are the foundation of the brand. Unfortunately, counter-intuitive analyses tend to be fragile creatures that don't fare that well under scrutiny (intuition has a pretty good track record).

The analysis of modes of drunken transportation would be one of the more fragile ones. Most competent internal reviewers would have had the same reaction that Ezra Klein had:
You can go on and on in this vein. It's terrifically shoddy statistical work. You'd get dinged for this in a college class. But it's in a book written by a celebrated economist and a leading journalist. Moreover, the topic isn't whether people prefer chocolate or vanilla, but whether people should drive drunk. It is shoddy statistical work, in other words, that allows people to conclude that respected authorities believe it is safer for them to drive home drunk than walk home drunk. It's shoddy statistical work that could literally kill somebody. That makes it more than bad statistics. It makes it irresponsible.
Let me be clear. I am not saying that Levitt and Dubner knew there were mistakes here. Quite the opposite. I'm saying they had a highly saleable manuscript ready to go which contained no errors that they knew of, and that any additional checking of the facts, the analyses or logic in the manuscript could only serve to make the book less saleable, to delay its publication or to put the authors in the ugly position of publishing something they knew to be wrong.

Gelman closes his post with this:
It's the nature of interesting-but-true facts that they're most interesting if true, and even more interesting if they're convincingly true.
Perhaps, but Levitt and Dubner have about four million reasons that say he's wrong.

When you really want to argue causality...

There's always a way.

John Quiggin does the dirty work:
I underestimated the speed and power of Zombie ideas. As early as Sep 2009, Casey Mulligan was willing to claim that the entire crisis could be explained in terms of labor market interventions. According to Mulligan, financial markets anticipated a variety of measures from the Obama Administration, observing ‘Arguably, the 2008 election was associated with an increase in the power of unions to shape public policy, and thereby the labor market. Congress has considered various legislation that would raise marginal income tax rates, and would present Americans with new health benefits that would be phased out as a function of income.’

This is truly impressive. So perspicacious are the financial markets, that even the possibility that Congress might raise taxes, or incorporate a means test in health care legislation that might be passed some time in the future (at the time of writing this in Feb 2010, the bill was still tied up) was sufficient to bring down the entire global financial market. And, even though the McCain-Palin ticket was widely seen as having a good chance (at least before the September 2008), the markets didn’t wait for the election returns to come in. Applying some superstrong version of market efficiency, market participants predicted the election outcome, applied Mulligan’s neoclassical model to the predicted policies of the Obama Administration and (perfectly rationally) panicked.

Friday, February 26, 2010

IPTW news

Peter C. Austin in his new article The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies compares a number of different propensity score approaches for modeling risk differences. Curiously, inverse probability of treatment weighting out-performed matching on propensity scores. My intuition is that they would have had similar levels of accuracy and bias.

It's going to be interesting to think about why this result holds.

Neat stuff -- go, read and enjoy!

Thursday, February 25, 2010

Editing

Andrew Gelman makes a great point about editing ; there is nothing that helps more than having somebody do a critical read through a a manuscript to point out where your reasoning is sloppy. This is one of the most helpful things a colleague can do for you, yet one of the least rewarded. It can be painful to hear these comments but it's worth every agonizing moment.

Wednesday, February 24, 2010

Stand and deliver

This article by the gifted Olivia Judson* explores the research about sitting and obesity that Joseph was talking about and makes some interesting suggestions:
Some people have advanced radical solutions to the sitting syndrome: replace your sit-down desk with a stand-up desk, and equip this with a slow treadmill so that you walk while you work. (Talk about pacing the office.) Make sure that your television can only operate if you are pedaling furiously on an exercise bike. Or, watch television in a rocking chair: rocking also takes energy and involves a continuous gentle flexing of the calf muscles. Get rid of your office chair and replace it with a therapy ball: this too uses more muscles, and hence more energy, than a normal chair, because you have to support your back and work to keep balanced. You also have the option of bouncing, if you like.
* and could someone explain to me why the New York Times' best science writer only shows up in the opinion section.

“We've made enormous advances in what they're called” -- more on corporate data cooking

Yesterday, I mentioned how bundled offers and the ability to pick the most advantageous data could allow a company to produce any number of grossly dishonest statistics. Today over at Baseline Scenario, James Kwak explains how J.P. Morgan can use acquisitions and flexible definitions to perform similar magic with its promise to loan $10 billion to small businesses:
Still, $10 billion is still an increase over the previous high of $6.9 billion in 2007, right? Well, not quite. Because in the meantime, JPMorgan Chase went and bought Washington Mutual. At the end of 2007, Washington Mutual held over $47 billion in commercial loans of one sort or another (from a custom FDIC SDI report that you can build here). Most of those are not small business by JPMorgan’s definition, since commercial real estate and multifamily real estate got put into the Commercial Banking business after the acquisition. But that still leaves $7.5 billion in potential small business loans, up from $5.1 billion at the end of 2006, which means WaMu did at least $2.4 billion of new lending in 2007.

I don’t know how much of this is small business lending, but this is part of the problem — banks can choose what they call small business lending, and they can choose to change the definitions from quarter to quarter. It’s not also clear (from the outside, at least) what counts as an origination. If I have a line of credit that expires and I want to roll it over, does that count as an origination? My guess is yes. Should it count as helping small businesses and the economy grow? No.

Sitting and obesity

It's one of the more difficult epidemiology questions to answer: why is obesity rising so quickly?

This is a very hard question to answer decisively, there is some reason that Americans have gotten over-weight in the past 30-40 years. It's not pure food abundance as we have had that for a long time. It's not genetic in the sense of the population genetics changing as there has not been enough time (genetic susceptibility is another matter).

So the idea that more time spent sitting leads to obesity is a very interesting hypothesis. I wonder how feasible it would be to design a cluster randomized trial for workplace interventions (like standing to use the computer).

Tuesday, February 23, 2010

Avandia to be withdrawn?

From Derek at In the Pipeline, it looks like leaks from a Senate report indicate that Avandia is about to be removed from the market. Thus ends a long run of pharmacoepidmeiology papers on the subject. It's not been an area that I worked in personally, but some of my friends have. Studying the heart risks of Avandia is tricky for observational data -- the disease being treated (diabetes) is a risk factor for the major side effect. This makes it very hard to separate disease and drug effects (especially since it is hard to control for severity and duration of a silent disease like diabetes).

But the existence of a comparator drug that showed a better risk profile for cardiovascular events was probably the decisive factor. Pharmacovigilance really can save lives!

How to Lie with Statistics -- Allstate Edition

For our latest statistical lie of the week, check out the following commercial.




At the risk of putting too fine a point on it, here's a full breakdown.

Customers of the two companies fall into one of four categories:

Geico customers who would get a better deal with All State;

Geico customers who would get a better deal with Geico;

All State customers who would get a better deal with All State;

All State customers who would get a better deal with Geico.

If we knew the relative sizes of those four groups and the average savings of the first and last groups we'd have a fairly comprehensive picture. Not surprisingly neither Allstate nor GEICO went that far. Both companies talk about the savings of people who switched.

Most people presumably switch providers to get a better deal (putting them in the first or last groups). Furthermore, switching is a hassle so the savings have to be big enough to make up for the trouble. The result are highly biased self-selecting samples of the first and last groups.

When GEICO simply mentions a potential savings of 15%, they are being a bit less than forthcoming but the claim that you might be able to save a substantial amount of money by switching is reasonable. For honest-to-goodness lying you need to wait for the Allstate commercial.

Allstate also bases their claims on the savings of those who switched to their company, but unlike GEICO they use those claims as part of a classic lie-by-hypothesis -- making a statement then supporting it with an incomplete or unrelated statistic. The ad starts with a trustworthy-sounding Dennis Haysbert saying "If you think GEICO's the cheap insurance company, then you're going to really be confused when you hear this" then touting an average savings of $518.

Yes, you might be confused, particularly if you don't realize the sample is ridiculously biased or that we aren't told the size of the policies or how long a period the $518 average was calculated over (the small print at the bottom refers to 2007 data which seems a bit suspicious, particularly given the following disclaimer at the bottom of Allstate's website "*$396 Average annual savings based on information reported nationally by new Allstate auto customers for policies written in 2008." No competitor is mentioned so the second number is presumably a general average. This could explain the difference in the numbers but not decision to shift periods).

I would also be suspicious of the data-cooking potential of Allstate's bundled products. Here's how the old but effective scam works: you single out one product a loss leader. They may sell this as a feature -- save big on car insurance when you get all of your coverage from Allstate -- or the numbers may be buried so deeply in the fine print that you have no idea how your monthly check is being divided. Either way this gives the people massaging the data tremendous freedom. They can shift profits to areas that Wall Street is excited about (happens more often than you might think) or they can create the illusion of bargains if they want to counter the impression of being overpriced. I don't know if any of this is going on here but I'm always cautious around numbers that are this easy to cook.

I would also take into account Allstate's less than shining reputation in the insurance industry, particularly regarding the company's strategies since the mid-Ninties. The story has been covered by Business Week, PBS and Bloomberg which supplied the following:

One McKinsey slide displayed at the Kentucky hearing featured an alligator with the caption ``Sit and Wait.'' The slide says Allstate can discourage claimants by delaying settlements and stalling court proceedings.

By postponing payments, insurance companies can hold money longer and make more on their investments -- and often wear down clients to the point of dropping a challenge. ``An alligator sits and waits,'' Golden told the judge, as they looked at the slide describing a reptile.

McKinsey's advice helped spark a turnaround in Allstate's finances. The company's profit rose 140 percent to $4.99 billion in 2006, up from $2.08 billion in 1996. Allstate lifted its income partly by paying less to its policyholders.
...
Allstate spent 58 percent of its premium income in 2006 for claim payouts and the costs of the process compared with 79 percent in 1996, according to filings with the U.S. Securities and Exchange Commission.
So, even if we put aside the possibility of data cooking, we still have an ethically tarnished company dishonestly presenting a meaningless statistic and that's good enough for our statistical lie of the week.

Monday, February 22, 2010

The Tuition Paradox

This post and Joseph's follow-up has gotten me thinking about a strange aspect of the economics of higher education in recent decades.

At the risk of oversimplifying, undergraduates are primarily paying for instruction and evaluation. The school will teach the student a body of knowledge and a set of skills and will provide the student with a quantitative measure (backed by the reputation of the school) of how well he or she mastered that knowledge and those skills.

The costs associated with providing those services is almost entirely labor driven. While there are exceptions (particularly involving distance learning), most instructors use minimal technology and many just rely on the white board. This is not a criticism (A good teacher with a marker always beats a bad teacher with a Powerpoint), but the costs of a service that can be provided with simple facilities and little or no specialized equipment will always be labor driven.

Twenty or thirty years ago, when you took an undergraduate class you were likely to be taught by a full-time faculty member, not someone with a high salary but reasonably well paid professional with good benefits and excellent job security. These days you are far more likely to be taught by a badly paid adjunct with no benefits or job security.

In other words, when you take into account inflation, the cost to universities of providing instruction and evaluation have dropped sharply while the amount universities charge to provide these services has continued to shoot up.

I'm not say that this is all a scam or that administrators are out there stuffing their pockets, but I do think there's something wrong with this picture.

Are humanities and science careers different?

Mark pointed me to an article by Thomas H. Benton about graduate school in the humanities. These issues have been persistent concerns in the field; I recall arguing about the job prospects of humanity graduates as an undergraduate philosophy major. I think that there really is an argument that the costs (in tuition, living expenses and so forth) that are required for an advanced degree in the humanities can't possibly be compensated for by post-degree job prospects.

Which is okay, if the goal of the degree is edification. But these degrees are not often marketed as expensive luxury goods . . .

In science, I think we are better off. We train people with marketable skills that can lead to careers. Post-degree placement is considered an important metric of success. But I think tales like this are a call to action to make sure that we continue to provide relevant training and to be cautious about blurring the distinction between data and anecdote in terms of outcomes.

If nothing else, it seems to be a good case for outcomes tracking . . .

Sunday, February 21, 2010

Academic work hours

It is pretty true that academics is not a Monday to Friday job. However, there is actually a nice compensation that can often happen. When I was at McGill I made some very good friends just by being in the lab at odd hours (especially late at night). There can be a sense of shared struggle that is an overlooked bonus. Of course, it'd have been even nicer if there was a late night office shop to take breaks in but you cannot have everything!

Friday, February 19, 2010

Multiple Testing

Interesting. False positives in popular fields appear to be much more strongly driven by the number of groups testing these hypotheses rather than by fiddling with data. A very comforting result, insofar as it is true.

More troublesome, is that it is unclear what we can do about it. Being better about publishing negative results helps but is never going to be a perfect solution; especially when reviewers may be more skeptical about results that do not match their intuition.

The difficulty of Soft Outcomes

There is currently a movement to ban combination medications with acetaminophen as an ingredient. The reasoning behind this appears to be due to the potential for liver damage caused by excessive doses of the medication. The estimate of 458 deaths per year seems like a lot, until you realize the denominator is not specified (it won't be the entire US population but it might be 10's of millions).

The other issue, and the one that is interesting to an epidemiologist, is the soft nature of the competing risk. The alternatives to acetaminophen is either a narcotic or a non-steroidal anti-inflammatory drug like ibuprofen. Both of these drugs have downsides (addiction, gastrointestinal bleeding) as well.

But the real alternative is less pain control. And that is hard to judge because it is a soft outcome. How much suffering is worth a life? Lives are easy to count but massively reduced quality of life is much, much trickier. But I think it is important to realize that a hard to measure outcome can still have a known and measurable effect on real people.

So I guess what I want to see a clear articulation of what are the alternatives to the current approach to publishing in hot fields.

Wednesday, February 17, 2010

Post-Doctoral Fellowships

Am I really atypical in having had a decent post-doctoral fellowship? Is it a feature of the University of Washington or of my PI?

But when I read bitter stories about bad experiences then I wonder if this is a "there but for the grace of some powerful entity I go".

I think one issue is that people expect a lot at the end of the PhD (and not without justification -- the PhD is a long and difficult process). But the reality is that the PhD is the license to do research -- meaning you get to start at the beginning all over again. After 12 years of schooling (and an outside career in the financial services industry) that can be rough.

I'm lucky to have found a supportive place and I am pleased that I am moving forward to a good position (although quite sad to be leaving the state of Washington). Here is hoping that academic work is as unexpected pleasant as post-doccing turned out to be!