Monday, April 21, 2014

What Nate Silver's critics are actually saying

Regarding the ongoing 538 discussion, it appears that we may be talking across each other in this case (from a previously mentioned comment by Kaiser Fung):

"The level of rigor that Krugman and others demand requires years, perhaps decades, of research to write one piece; meanwhile, the other critique is the content is not timely. Think about the full-time journalists he has hired - there isn't a way to pay them enough to do the kind of pieces that are being imagined. As we all know, data collection, cleaning and analysis take a huge amount of time. It may be months of work to get one article out."

Other than Krugman, I'm not sure exactly whom Kaiser was referring to in that first group but I assume, since it was a comment on my post, that I'm in there somewhere (and given my other comments, it's certainly not in the timely group). The trouble is, as far I can tell, I haven't said anything like this and Krugman has actually said the opposite.
Similarly, climate science has been developed by many careful researchers who are every bit as good at data analysis as Silver, and know the physics too, so ignoring them and hiring a known irresponsible skeptic to cover the field is a very good way to discredit your enterprise. Economists work hard on the data; on the whole you’re going to do better by tracking their research than by trying to roll your own, and you should be very wary if your analysis runs counter to what a lot of professionals say.
In other words, when reporting on a field outside of their expertise, 538's writers should forgo all that original "data collection, cleaning and analysis," and instead report on serious research being done by experts in the field (and it's worth noting that when Krugman talks about listening to experts earlier in the post, he links to the Monkey Cage).

So this won't look like cherry-picking, I'll be as transparent and inclusive as possible. As far as I can tell, Krugman wrote four posts relevant to this discussion. Here are the name and date of each along with quotes and a summary:

Sergeant Friday Was Not A Fox
 MARCH 18, 2014, 7:55 AM
What worries me, based on what we’ve seen so far — which isn’t much, but shouldn’t the site have debuted with a bang? — is that it looks as if the Silverites have misunderstood their mission.

Nate’s manifesto proclaims his intention to be a fox, who knows many things, rather than a hedgehog, who knows just one big thing; i.e., a pundit who repeats the same assertions in every column. I’m fine with that.

But you can’t be an effective fox just by letting the data speak for itself — because it never does. You use data to inform your analysis, you let it tell you that your pet hypothesis is wrong, but data are never a substitute for hard thinking. If you think the data are speaking for themselves, what you’re really doing is implicit theorizing, which is a really bad idea (because you can’t test your assumptions if you don’t even know what you’re assuming.)
We could go back and forth about how it applies in this case, but every serious STEM blogger I know of holds to the "hard thinking" standard. To do any less is to sink to the level of "Numbers in the News" infographics. Still more important (for me at least), is the part about implicit assumptions. The problem is particularly worrisome when experts jump fields, which leads neatly into the next post.

Further Thoughts on Hedgehogs and Foxes
 MARCH 18, 2014, 4:15 PM
Now, about FiveThirtyEight: I hope that Nate Silver understands what it actually means to be a fox. The fox, according to Archilocus, knows many things. But he does know these things — he doesn’t approach each topic as a blank slate, or imagine that there are general-purpose data-analysis tools that absolve him from any need to understand the particular subject he’s tackling. Even the most basic question — where are the data I need? — often takes a fair bit of expertise; I know my way around macro data and some (but not all) trade data, but I turn to real experts for guidance on health data, labor market data, and more.

What would be really bad is if this turns into a Freakonomics-type exercise, all contrarianism without any appreciation for the importance of actual expertise. And Michael Mann reminds me that Nate’s book already had some disturbing tendencies in that direction.
As before, we can discuss the merits of the Freakonomics school of scientific writing (at the  risk of oxymoron, I am consistently against constant contrarianism) and argue about the applicability of these charges against 538 (though in this case, Krugman is careful to phrase these as concerns), but this passage in no way matches what Krugman is supposed to have said.

Tarnished Silver
 MARCH 23, 2014, 10:48 AM
But I’d argue that many of the critics are getting the problem wrong. It’s not the reliance on data; numbers can be good, and can even be revelatory. But data never tell a story on their own. They need to be viewed through the lens of some kind of model, and it’s very important to do your best to get a good model. And that usually means turning to experts in whatever field you’re addressing.

Unfortunately, Silver seems to have taken the wrong lesson from his election-forecasting success. In that case, he pitted his statistical approach against campaign-narrative pundits, who turned out to know approximately nothing. What he seems to have concluded is that there are no experts anywhere, that a smart data analyst can and should ignore all that.

I've seen others make this Politico-fallacy argument (i.e. Silver's experience dealing with the idiots who had been doing sports and election prognostication has left him with a skewed view of the world). There's probably some truth there but I think it's an oversimplification.

Data as Slogan, Data as Substance
 MARCH 26, 2014, 1:00 PM
Noah Smith has the definitive piece on what’s wrong, so far, with the new FiveThirtyEight. For all the big talk about data-driven analysis,what it actually delivers is sloppy and casual opining with a bit of data used, as the old saying goes, the way a drunkard uses a lamppost — for support, not illumination.

In sum, this so-called “data-driven” website is significantly less data-driven (and less sophisticated) than Business Insider or Bloomberg View or The Atlantic. It consists nearly entirely of hedgehoggy posts supporting simplistic theories with sparse data and zero statistical analysis, making no quantitative predictions whatsoever. It has no relationship whatsoever to the sophisticated analysis of rich data sets for which Nate Silver himself has become famous. 
The problem with the new FiveThirtyEight is not one of data vs. theory. It is one of “data” the buzzword vs. data the actual thing.

This is perhaps the closest we get to the alleged demands for Silver to deliver more sophisticated analysis but it falls far short of the "months of work to get one article out" that Krugman was supposed to have ask for (The very fact that Business Insider or Bloomberg View or The Atlantic are able to do it shows that it is doable) and, more importantly, it came, not from Krugman but from the pleasant and well-liked Smith.

To summarize Krugman's position, data should be viewed in context as part of an argument or analysis. Part of that context should be the mainstream research be done in an area and when the writer is not an expert in that field, he or she should seek one out. On a related note, pieces that assert that the experts have missed the obvious (Freakonomics-style contrarianism) should be checked carefully, as should implicit assumptions.

I am broadly in agreement with Krugman on these points (particularly with Freakonomics-style journalism) though I would add a few more concerns that go along with some long-running threads here at the blog. The first involves scale. We should limit criticisms to choices, not circumstances, and in most enterprises some of the most important choices made regard size and scope.

I believe Silver may have fallen into the closely related traps of the growth fetish and the Devil's Candy (the latter being the ratcheting effect where meeting certain scale targets require changes which in turn require even larger scale targets). Something similar but probably more damaging occurred when he expanded the scope. As long as he was primarily writing or editing politics and sports stories (areas where he has extraordinary expertise), it was much easier for him to maintain a high level of quality control.

As far as I can tell, all of the real low points of the new 538 have occurred outside of these specialties (I know that Benjamin Morris' analysis of NBA steals caught a lot of flack but, while flawed, it struck me as a reasonable effort). The most embarrassing has been the hiring of Roger Pielke Jr., whose prebutted*  climate change piece has done more than anything else to damage the brand that Silver worked so hard for so many years to build.

My second big concern (which is somewhat more in line with Krugman) is with bungee jumping analysts. Experts (usually economists, often physicists, though Pielke shows that political scientists can also get into the act) who think that, because they have occasionally used some similar statistical methods, they are fully qualified in fields where they have no background or experience. Emily Oster's work with fetal alcohol syndrome and the notorious Freakonomics drunk driving analysis are apt examples.

Obviously, we can go back about these criticisms, both on a general level (for example, is there such a thing as Freakonomics-style contrarianism and, if so, is it bad?) and a specific one (has 538 really been moving in the direction suggested by Smith, Krugman and me?). A good, vigorous discussion of these points would be tremendously helpful, but any productive counterargument has got to start by countering actual arguments.

* From the article linked to above:
But just as Pielke’s article has been written before, so too it has been criticized before. Dr. Kevin E. Trenberth, a distinguished senior climate scientist at the National Center for Atmospheric Research, has criticized Pielke’s data for its simplistic nature. Simply showing that an increase in damage has corresponded to an increase in wealth ignores the fact that communities are now more prepared than ever for extreme storms, Trenberth wrote at the time.

Note: Somehow  my attempt to schedule this for a future date turned into a publish now command, so the first dozen or so people got to see a few extra typos.


  1. They had a piece today called, absurdly, "The Potential Bubble the Federal Reserve Cares Most About". I don't hold titles against authors. The piece itself is on the whole reasonable; it basically lists the concerns of Jeremy Stein about a bond bubble. But it fails an essential test in my eyes: it lists these concerns without considering why the rest of the Fed, including its chairman, have not agreed with them. It's as if the dissenter on the Fed were the majority. That this isn't the case slides along in the background as though it were "a meteor may kill us all" warning with the obligatory paragraph that "if the Fed raises rates early, this may be why."

    Talking without context is to me little more than advocacy. I can make all sorts of arguments in favor of Creationism, even young Earth creationism, and they work great ... as long as they aren't put in the scientific and factual context that shows they're idiotic. I'm not saying Stein's views are idiocy, but that a piece without context is not a good thing.

  2. Thanks for keeping the conversation going. A long post deserves a long response. (have to break this into two!)

    I’ll start with things on which we agree.
    a) We agree that the Freakonomics style of blogging is doing science a disservice.
    b) We agree that conventional wisdom is usually right, and excessive contrarianism is undesirable.
    c) We both have issues with Nate’s views on climate change.
    d) We both have issues with the fox/hedgehog metaphor.

    On a) and b), my views are on record. On c) and d), while I can’t endorse Nate’s views on these topics, I don’t see them as defining Nate Silver. What I learn from some of his critics who admit to be Nate Silver fans is that they have not read his book carefully. Having reviewed The Signal and the Noise, I noticed both of those views and did not let them distract from my general favorable opinion. The fox/hedgehog metaphor was introduced on pp. 53-4 right at the start; it is too black and white in my opinion, and you won’t hear me using it. The climate chapter is not as extreme as the chapter in SuperFreakonomics is the best I can say about it. (Also, in this sense, I can’t see how Nate Silver’s brand is being tarnished amongst those who read and liked his book.)

    Indeed, we were talking past each other. So let me try to get us to a better place.

    I propose that we start with the right comparables. FiveThirtyEight is aspiring to be a place where every piece of writing is data-driven. I can only think of Freakonomics blog as a true comparable. Academic blogs and niche blogs like ours are another comparison, but not as ideal due to the scale difference which I’ll address.

    What I consider unfair comparisons are Vox, Business Insider, The Atlantic, and Bloomberg. Only a small fraction of their articles contain any data analysis. Those brands are not known for data-driven journalism. In any case, they publish lots of articles with questionable analytics. I’m truly surprised that Business Insider is on that list. In the business world, it is treated as a tabloid. Most businesspeople I work with read it but not for serious analysis.

    The second point of difference is that you draw a hard line between “hard science” and “Numbers in the News infographics” while I see the line between rigor and readability as a continuum. Previously, I find the Freakonomics blog and SuperFreakonomics on the wrong end of the line. For FiveThirtyEight, it’s clear that Nate’s intention is to situate the site toward the other end of the line, and I hope he finds the right spot.

    I actually agree with much of Krugman’s critique, such as the
    need for models, and data by itself not being useful. I also frequently evangelize these two points. However, I don’t support suppressing imperfect analyses; in statistics, I believe there are very few inadmissible analyses, very few perfect analyses, and a lot of imperfect analyses, all of which can be challenged in any number of ways. Even in the case of Freakonomics and, to introduce another controversial example, Malcolm Gladwell, I encourage people to read those books but apply critical thinking.

    I find troubling the argument that only domain experts are qualified to do data analyses. My own perception of the field of statistics is different. I believe there is expertise in “data analysis” that is domain-free. I shudder to think about the future of the field of statistics were that a falsehood. One of the exciting things about statistics is that it establishes philosophies of how to look at data that has wide applications all over the map. I’m not arguing that no domain knowledge is ever needed. Of course, a generalist who approaches a new area should talk to experts and understand prior work. But it’s fair to say that most experts are in love with their own models.

    to be continued

  3. continued from above

    As an aside, while we may think Emily Oster and “Freakonomists” are notorious for sloppy analyses, I have always had the impression that they are considered stars within the economics academia. This leads to the possibility that Nate Silver is allowed to do what he does so long as he doesn’t cross the line and meddle with economics. I don’t find credible the notion that there were no real experts in election forecasting before Nate showed up, that everyone else is some sort of “campaign-narrative pundit”.

    You argued that these data journalists should not do their own data collection and analyses. I happen to think that such work is what sets a data journalist apart from a not-data journalist. I will invoke Cathy O’Neil here (she’s not a fan of Nate Silver so she might object) who recently advises people to “never trust anything until you’ve checked it”. Readers don’t have time or skills to check all data analyses. I see the data journalist’s role as providing that scrutiny.

    You cannot judge the credibility of data analyses by just speaking with the person who did the work. In the entire process of data collection, processing and analysis, dozens of assumptions are deployed. Few experts will have the patience to open up their analysis and have their every assumption nitpicked, especially if they regard the journalist as a neophyte. This reminds me of the fracas over Reinhart-Rogoff error, and it was several years before they let their simple Excel spreadsheet out of the bag.

    There is also the issue of exaggerated claims, and downright fraud. I have seen countless reports of an A/B test generating up to X% lift (X is an impressive number, like 50 or 100) in some segment of population. Unless I can see the data, especially the lift in all segments, plus details of execution, I won’t even give it time of day. Would you?

    Perhaps I’m now getting to the real subject, which is the goal of data journalism. If I read your view accurately, the data journalist should approach experts and interview them about their quantitative work, and then report their findings to the public. To me, this is no different from science journalism, and it already exists. Perhaps you’re arguing that data journalism is not a thing, or not a standalone thing.

    I take the view that Nate is trying something new to journalism, and needs time to work out the details. (Some people will argue that it’s not completely new; past efforts have not succeeded, which is worth noting.)
    Having said that, I now assume data journalism is a standalone endeavor, and that its goal is to report data-driven science with a level of scrutiny not found on science pages today. How should one build and run such an operation?

    The first problem is the time it takes to deliver that level of scrutiny, especially if data collection, processing and analysis are involved. Having a mixture of content with varying degrees of rigor is a reasonable approach.

    The second problem is that Nate has 10 or 15 full-time staffers; his site needs to drive enough traffic to make money unlike academic blogs where the blogger is looking for reputation or publicity or perhaps nothing at all. This is the reality of advertising-dollar-supported content.

    The third problem, as you pointed out, is scale, and the fact that the average quality goes down as size goes up.

    The fourth issue is the audience. It’s inevitable that the average reader will be less quantitatively sophisticated, as the audience increases. I’d hope that the total readership for data-driven articles will expand beyond the “choir”. I also know that speaking to the average reader the same way you speak to a statistician is not fruitful.

    I think you probably agree on most of these issues too. I suspect you reach the conclusion that a standalone data-driven news enterprise won’t fly while at this point, I say go for it and see what happens.