West Coast Stat Views (on Observational Epidemiology and more)

Thursday, April 24, 2014

Why I criticize 538 more than Business Insider (mainly because I don't read Business Insider)

Okay, that's not really true. I do check out the occasional Business Insider article when it is recommended by one of the bloggers I follow and I do have other reasons for discussing 538. Silver's website is new and newsworthy and it publishes a number of important writers (including Silver himself) whom anyone interested in statistics should read. For this and other reasons, 538 has become ground zero for discussions about how the media should cover data.

Unfortunately, one side effect of all this attention has been to create the impression of implicit comparisons. When we talk about the weaker articles in 538 because we think the direction of the website is important, we can leave people thinking that weak articles are disproportionately found in 538. That is by no means a sound conclusion.

With the obvious exception of Roger Pielke Jr., my least favorite 538 hire is probably Walt Hickey (though concerned, I'm reserving judgement on Emily Oster for the moment). Hickey seems like a nice, well-intentioned fellow, but from what I've seen, he's an excellent example of one of those data journalists who understand the math but not the statistics, getting the procedures right but missing the point (this is somewhat analogous to Feynman's comments about textbook writers missing the nuances of math and science).

I decided to Google him. It turns out that Hickey (going under the slightly more businesslike 'Walter') was a prolific contributor to Business Insider (among other sites). Since he seemed to be doing a lot of entertainment reporting for 538, I looked for something similar on BI and came up with this:

Here's Where All The Miley Cyrus Haters Live

The metric used was the addresses (five-digit zip only) of the 158 complaint letters sent to the FCC after Miley Cyrus's performance at MTV's VMA award show. This is not a good data set but it is possible to do some mildly interesting demographic breakdowns. It's not as good as it would have been with nine-digit zips (those open up a lot of useful information), but you could, for example, look at things like city size.

But what you would never want to do with 158 addresses is a state-by-state breakdown.

This was followed by a list of "the top ten most irate states, based on letters sent per capita" with the sparsely populated South Dakota coming in at number four based on just one letter to the FCC.

(as a side note, when I went back to the article to write this I tried to find it again by searching Business Insider for Miley Cyrus. Big mistake. You would not believe how many posts came up.)

Feel free to discuss this graph, but the point I want to make is that based on this and the other articles I looked at, Hickey appears to have improved considerably when he moved to 538. I'm still not impressed with the work he's doing now, but that's an absolute, not a relative statement. Furthermore, this case raises some real questions about Noah Smith's claim that "In sum, this so-called “data-driven” website is significantly less data-driven (and less sophisticated) than Business Insider or Bloomberg View or The Atlantic."

Wednesday, April 23, 2014

Believe it or not, we've been talking about the nice Krugman -- some perspective on the 538 debate

[You may have trouble getting past the NYT firewall on these. If so, the easiest way around this is either to Google name and author or do what I did and go here for a complete set.]

One of the side questions in the ongoing 538 debate is whether or not Nate Silver and his writers are being excessively criticized. There is certainly some truth to the charge (for reasons I'll get into later), but it's also important to remember that, to a remarkable extent, Silver walked into a bar fight, a number of intense, ongoing debates about science and statistics, some of which had long ago turned quite nasty.

Quite a few of those fights involved Freakonomics, and the topic of climate change in particular and contrarian data journalism in general. The hiring of Roger Pielke Jr. and Emily Oster raised the specter of those two issues respectively. It was pretty much inevitable that the association would heighten the criticism of 538. That association does not mean, however, that the two are being equated. As far as I can tell, the tone of criticism of Silver within the analytic community has been disappointed and concerned rather than angry.

Having previously discussed Krugman's criticisms of 538, it's useful to compare them to his reaction to Superfreakonomics. For me, at least, the difference in tone is notable.

From
A counterintuitive train wreck

At first glance, though, what it looks like is that Levitt and Dubner have fallen into the trap of counterintuitiveness. For a long time, there’s been an accepted way for commentators on politics and to some extent economics to distinguish themselves: by shocking the bourgeoisie, in ways that of course aren’t really dangerous. Ann Coulter is making sense! Bush is good for the environment! You get the idea.

Clever snark like this can get you a long way in career terms — but the trick is knowing when to stop. It’s one thing to do this on relatively inconsequential media or cultural issues. But if you’re going to get into issues that are both important and the subject of serious study, like the fate of the planet, you’d better be very careful not to stray over the line between being counterintuitive and being just plain, unforgivably wrong.

It looks as if Superfreakonomics has gone way over that line.

From
Superfreakonomics on climate, part 1

OK, I’m working my way through the climate chapter — and the first five pages, by themselves, are enough to discredit the whole thing. Why? Because they grossly misrepresent other peoples’ research, in both climate science and economics.
...
Yikes. I read Weitzman’s paper, and have corresponded with him on the subject — and it’s making exactly the opposite of the point they’re implying it makes. Weitzman’s argument is that uncertainty about the extent of global warming makes the case for drastic action stronger, not weaker. And here’s what he says about the timing of action:

Again, we’re not even getting into substance — just the basic issue of representing correctly what other people said.

The conventional economic advice of spending modestly on abatement now but gradually ramping up expenditures over time is an extreme lower bound on what is reasonable rather than a best estimate of what is reasonable.

From

Weitzman in context

But you’d never get this point from the way the book quotes Weitzman, which cites his probability of utter catastrophe as if it were a reason to be skeptical of the need to act. I suspect, though I don’t know this, that the authors were just careless — they skimmed Weitzman’s paper, which is densely written, saw a number they liked, and didn’t ask what the number meant.

And that sort of carelessness is the general sense I get from the chapter.

Levitt now says that the chapter wasn’t meant to lend credibility to global warming denial — but when you open your chapter by giving major play to the false claim that scientists used to predict global cooling, you have in effect taken the denier side. The only way I can reconcile what Levitt says now with that reality is that he and Dubner didn’t do their homework — not only that they didn’t check out the global cooling stuff, the stuff about solar panels, and all the other errors people have been pointing out, but that they didn’t even look into the debate sufficiently to realize what company they were placing themselves in.

And that’s not acceptable. This is a serious issue. We’re not talking about the ethics of sumo wrestling here; we’re talking, quite possibly, about the fate of civilization. It’s not a place to play snarky, contrarian games.

From

Superfreakingmeta

One good aspect of the controversy, though, has been some broader analysis of what it all means. I liked three recent comments in particular.

Joshua Gans identifies in Dubner and Levitt an odd inconsistency that I’ve identified more broadly: those who go on and on about how people respond to incentives when they’re making a pro-free-market argument suddenly seem to lose all faith in the power of incentives when the goal is to induce more environmentally friendly behavior:

But come on. Isn’t the whole point of the Freakonomics project that prices work and behaviour changes in response to incentives? Everywhere else, a few pennies will cause massive consumption changes while when it comes to a carbon price, it is all too hard.

Ryan Avent makes a general point about people who dismiss cap-and-trade as too hard, then promote something else that only seems easier because you haven’t thought it through. I agree with him about the carbon tax issue; and while I hadn’t thought about applying the same principle to geoengineering, he’s completely right. Having somebody — who? The United States? The United Nations? The Coalition of the Willing? — pump sulfur into the atmosphere through an 18-mile tube, or cut off sunlight with a giant orbital mirror, would either (a) require many years of hard negotiations or (b) quite possibly set off World War III. If it’s (a), why is that so much easier than a global agreement on emissions? (Which, as Brad points out, really would only have to involve four big players.)

Finally, Andrew Gelman poses a question:

The interesting question to me is why is it that “pissing off liberals” is delightfully transgressive and oh-so-fun, whereas “pissing off conservatives” is boring and earnest?

I have a theory here, although it may not be the whole story: it’s about careerism. Annoying conservatives is dangerous: they take names, hold grudges, and all too often find ways to take people who annoy them down. As a result, the Kewl Kids, as Digby calls them, tread very carefully when people on the right are concerned — and they snub anyone who breaks the unwritten rule and mocks those who must not be offended.

Annoying liberals, on the other hand, feels transgressive but has historically been safe. The rules may be changing (as Dubner and Levitt are in the process of finding out), but it’s been that way for a long time.

The “tell”, I’d suggest, is that once you get beyond those for whom the decision about whom to laugh at is a career move, people don’t, in fact, seem to find mocking liberals funnier than mocking conservatives. Jon Stewart and Stephen Colbert are barreling along, while right-wing attempts to produce counterpart shows have bombed.

Anyway, say this for Dubner and Levitt — they’ve provoked an interesting discussion, although probably not the one they hoped for.

From

Elizabeth Kolbert can’t say that, can she?

But mainly, I’m envious. [Elizabeth] Kolbert builds the essay around an extended metaphor involving, um, equine effluvia that I’m pretty sure wouldn’t be allowed under Times style. On the whole, the requirement that Times writers show appropriate dignity is good for everyone; still, sometimes I’m wistful.

Oh, and the reference in the title of this post is to the much-missed Molly Ivins.

Tuesday, April 22, 2014

What do grades measure?

[I wrote this in the middle of the big SAT thread and I thought I had posted it weeks ago but it appears that I never got around to it. So better late than never...]

As discussed before, many of the calls for getting rid of the SAT use the argument that high school grades are a better indicator of college success so we don't need the SAT. There's a modeling fallacy here (also as previously discussed), but putting that aside, the suggestion that we should rely almost entirely on grades as a measure of academic accomplishment (not to be confused with measures of character and personal achievement) raises the question of what exactly do grades measure? Put another way, what factors do we expect to be highly correlated with grade-point average?

First off, let's think about where grades come from. In most classes grades come almost entirely from tests, homework, in-class activities and writing assignments. In some cases there is an unavoidable subjective element in the evaluation of this work. With this in mind, think about what attributes and personality traits would correlate with higher performance.

Various forms of intelligence would doubtless factor in. This is an extraordinarily complex topic, but, in general, it's safe to say that school tends to easier if you're smarter.

The ability to memorize would possibly be an even bigger factor in many (perhaps most) courses. Closely related to this attribute and in some cases indistinguishable from it by many metrics would be the tolerance for the act of rote memorization. Lots of people with excellent memories find the act of sitting and going over the same facts again and again extremely unpleasant. Put another way, this is one of the many areas where hard work can compensate for a lack of aptitude.

This second attribute overlaps with the next major related categories: discipline, patience and focus. A great deal of academic success depends on the ability to spend large amounts of time going over material that is neither interesting nor challenging. (This can lead to the paradoxical but not uncommon result of high aptitude leading to boredom leading to poor performance in the area of that aptitude.) I'd argue these factors are often the dominant drivers of GPA.

Because of the unavoidable element of subjectivity, the halo effect and likability can also improve grade point averages.

Between the level of material covered and the need to fashion lessons and tests to serve large numbers of students, grades often tend to favor conventional thinkers over more original ones. As students progress through college, the emphasis tends to shift to more divergent learning but at least in high school, the student who thinks differently will often be penalized.

Personal stability and home life can also be a major factor, particularly in areas like homework and other out-of-class assignments.

Finally, there is the support network: quality of instruction; availability of tutors and homework assistance; libraries and learning centers; computers with good reliable Internet access; family members who have both the time and the ability to help explain assignments.

From an analytic standpoint, it would be nice if we had separate metrics for each of these aspects. As it is, we really can't distinguish between the the student with exceptionally good grasp of material in the a student who worked hard or who had a lot of help..

This is not a call for reforming all our grading system. Though there is certainly room for improvement, it is far from the most pressing matter we face and, more importantly, badly thought-out changes (and badly thought-out has been the reform norm lately) can do far more harm than good.

What we do need to do with this or any other ranking system is try to understand its drivers and limitations and to take steps to minimize the damage caused by mistakes (because mistakes will happen).

Monday, April 21, 2014

What Nate Silver's critics are actually saying

Regarding the ongoing 538 discussion, it appears that we may be talking across each other in this case (from a previously mentioned comment by Kaiser Fung):

"The level of rigor that Krugman and others demand requires years, perhaps decades, of research to write one piece; meanwhile, the other critique is the content is not timely. Think about the full-time journalists he has hired - there isn't a way to pay them enough to do the kind of pieces that are being imagined. As we all know, data collection, cleaning and analysis take a huge amount of time. It may be months of work to get one article out."

Other than Krugman, I'm not sure exactly whom Kaiser was referring to in that first group but I assume, since it was a comment on my post, that I'm in there somewhere (and given my other comments, it's certainly not in the timely group). The trouble is, as far I can tell, I haven't said anything like this and Krugman has actually said the opposite.

Similarly, climate science has been developed by many careful researchers who are every bit as good at data analysis as Silver, and know the physics too, so ignoring them and hiring a known irresponsible skeptic to cover the field is a very good way to discredit your enterprise. Economists work hard on the data; on the whole you’re going to do better by tracking their research than by trying to roll your own, and you should be very wary if your analysis runs counter to what a lot of professionals say.

In other words, when reporting on a field outside of their expertise, 538's writers should forgo all that original "data collection, cleaning and analysis," and instead report on serious research being done by experts in the field (and it's worth noting that when Krugman talks about listening to experts earlier in the post, he links to the Monkey Cage).

So this won't look like cherry-picking, I'll be as transparent and inclusive as possible. As far as I can tell, Krugman wrote four posts relevant to this discussion. Here are the name and date of each along with quotes and a summary:

Sergeant Friday Was Not A Fox
MARCH 18, 2014, 7:55 AM

What worries me, based on what we’ve seen so far — which isn’t much, but shouldn’t the site have debuted with a bang? — is that it looks as if the Silverites have misunderstood their mission.

Nate’s manifesto proclaims his intention to be a fox, who knows many things, rather than a hedgehog, who knows just one big thing; i.e., a pundit who repeats the same assertions in every column. I’m fine with that.

But you can’t be an effective fox just by letting the data speak for itself — because it never does. You use data to inform your analysis, you let it tell you that your pet hypothesis is wrong, but data are never a substitute for hard thinking. If you think the data are speaking for themselves, what you’re really doing is implicit theorizing, which is a really bad idea (because you can’t test your assumptions if you don’t even know what you’re assuming.)

We could go back and forth about how it applies in this case, but every serious STEM blogger I know of holds to the "hard thinking" standard. To do any less is to sink to the level of "Numbers in the News" infographics. Still more important (for me at least), is the part about implicit assumptions. The problem is particularly worrisome when experts jump fields, which leads neatly into the next post.

Further Thoughts on Hedgehogs and Foxes
MARCH 18, 2014, 4:15 PM

Now, about FiveThirtyEight: I hope that Nate Silver understands what it actually means to be a fox. The fox, according to Archilocus, knows many things. But he does know these things — he doesn’t approach each topic as a blank slate, or imagine that there are general-purpose data-analysis tools that absolve him from any need to understand the particular subject he’s tackling. Even the most basic question — where are the data I need? — often takes a fair bit of expertise; I know my way around macro data and some (but not all) trade data, but I turn to real experts for guidance on health data, labor market data, and more.

What would be really bad is if this turns into a Freakonomics-type exercise, all contrarianism without any appreciation for the importance of actual expertise. And Michael Mann reminds me that Nate’s book already had some disturbing tendencies in that direction.

As before, we can discuss the merits of the Freakonomics school of scientific writing (at the risk of oxymoron, I am consistently against constant contrarianism) and argue about the applicability of these charges against 538 (though in this case, Krugman is careful to phrase these as concerns), but this passage in no way matches what Krugman is supposed to have said.

Tarnished Silver
MARCH 23, 2014, 10:48 AM

But I’d argue that many of the critics are getting the problem wrong. It’s not the reliance on data; numbers can be good, and can even be revelatory. But data never tell a story on their own. They need to be viewed through the lens of some kind of model, and it’s very important to do your best to get a good model. And that usually means turning to experts in whatever field you’re addressing.

Unfortunately, Silver seems to have taken the wrong lesson from his election-forecasting success. In that case, he pitted his statistical approach against campaign-narrative pundits, who turned out to know approximately nothing. What he seems to have concluded is that there are no experts anywhere, that a smart data analyst can and should ignore all that.

I've seen others make this Politico-fallacy argument (i.e. Silver's experience dealing with the idiots who had been doing sports and election prognostication has left him with a skewed view of the world). There's probably some truth there but I think it's an oversimplification.

Data as Slogan, Data as Substance
MARCH 26, 2014, 1:00 PM

Noah Smith has the definitive piece on what’s wrong, so far, with the new FiveThirtyEight. For all the big talk about data-driven analysis,what it actually delivers is sloppy and casual opining with a bit of data used, as the old saying goes, the way a drunkard uses a lamppost — for support, not illumination.

In sum, this so-called “data-driven” website is significantly less data-driven (and less sophisticated) than Business Insider or Bloomberg View or The Atlantic. It consists nearly entirely of hedgehoggy posts supporting simplistic theories with sparse data and zero statistical analysis, making no quantitative predictions whatsoever. It has no relationship whatsoever to the sophisticated analysis of rich data sets for which Nate Silver himself has become famous.

The problem with the new FiveThirtyEight is not one of data vs. theory. It is one of “data” the buzzword vs. data the actual thing.

This is perhaps the closest we get to the alleged demands for Silver to deliver more sophisticated analysis but it falls far short of the "months of work to get one article out" that Krugman was supposed to have ask for (The very fact that Business Insider or Bloomberg View or The Atlantic are able to do it shows that it is doable) and, more importantly, it came, not from Krugman but from the pleasant and well-liked Smith.

To summarize Krugman's position, data should be viewed in context as part of an argument or analysis. Part of that context should be the mainstream research be done in an area and when the writer is not an expert in that field, he or she should seek one out. On a related note, pieces that assert that the experts have missed the obvious (Freakonomics-style contrarianism) should be checked carefully, as should implicit assumptions.

I am broadly in agreement with Krugman on these points (particularly with Freakonomics-style journalism) though I would add a few more concerns that go along with some long-running threads here at the blog. The first involves scale. We should limit criticisms to choices, not circumstances, and in most enterprises some of the most important choices made regard size and scope.

I believe Silver may have fallen into the closely related traps of the growth fetish and the Devil's Candy (the latter being the ratcheting effect where meeting certain scale targets require changes which in turn require even larger scale targets). Something similar but probably more damaging occurred when he expanded the scope. As long as he was primarily writing or editing politics and sports stories (areas where he has extraordinary expertise), it was much easier for him to maintain a high level of quality control.

As far as I can tell, all of the real low points of the new 538 have occurred outside of these specialties (I know that Benjamin Morris' analysis of NBA steals caught a lot of flack but, while flawed, it struck me as a reasonable effort). The most embarrassing has been the hiring of Roger Pielke Jr., whose prebutted* climate change piece has done more than anything else to damage the brand that Silver worked so hard for so many years to build.

My second big concern (which is somewhat more in line with Krugman) is with bungee jumping analysts. Experts (usually economists, often physicists, though Pielke shows that political scientists can also get into the act) who think that, because they have occasionally used some similar statistical methods, they are fully qualified in fields where they have no background or experience. Emily Oster's work with fetal alcohol syndrome and the notorious Freakonomics drunk driving analysis are apt examples.

Obviously, we can go back about these criticisms, both on a general level (for example, is there such a thing as Freakonomics-style contrarianism and, if so, is it bad?) and a specific one (has 538 really been moving in the direction suggested by Smith, Krugman and me?). A good, vigorous discussion of these points would be tremendously helpful, but any productive counterargument has got to start by countering actual arguments.

* From the article linked to above:

But just as Pielke’s article has been written before, so too it has been criticized before. Dr. Kevin E. Trenberth, a distinguished senior climate scientist at the National Center for Atmospheric Research, has criticized Pielke’s data for its simplistic nature. Simply showing that an increase in damage has corresponded to an increase in wealth ignores the fact that communities are now more prepared than ever for extreme storms, Trenberth wrote at the time.

Note: Somehow my attempt to schedule this for a future date turned into a publish now command, so the first dozen or so people got to see a few extra typos.

Friday, April 18, 2014

Good post on Vox about issues with ordinal variables

This was a very good article tackling the issues of trying to assign an ordinal score to a multi-dimensional variable. Mark has been saying this for years, already, but it is good to see statements like this coming out of more mainstream groups:

The problem with ordinal rankings — and the more variables, the more problems here — is that it implies meaningful differences between one job and the next one that is one ranking below it. You can definitively say that one job pays more than another, but is it true that clinical social worker is better than nail technician is better than middle school teacher, as US News' rankings imply? And even if somehow that were empirically provable, what's the practical application of this knowledge? Should the middle school teacher go be a social worker?

Rankings lists can occasionally provide useful functions but it is good to see more discussion of the limitations of these measures. Now who is brave enough to do this with post-secondary education?

Thursday, April 17, 2014

Gauss, the fox who decided to be a hedgehog

As mentioned before, I'm not entirely comfortable with the fox/hedgehog spectrum -- this isn't a concept that reduces readily to a scalar -- but as long as we're here...

One of the minor revelations of the recent discussion of the new 538 was that Andrew Gelman had posted on the subject of foxes and hedgehogs way back in 2005:

This got me thinking about statisicians. I think we’re almost all foxes! The leading stasticians over the years all seem to have worked on lots of problems. Even when they have, hedghehog-like, developed systematic ideas over the years, these have been developed in a series of applications. It seems to be part of the modern ethos of statistics, that the expected path to discovery is through the dirt of applications.

I wonder if the profusion of foxes is related to statistics’s position, compared to, say, physics, as a less “mature” science. In physics and mathematics, important problems can be easy to formulate but (a) extremely difficult to solve and (b) difficult to understand the current research on the problem. It takes a hedgehog-like focus just to get close enough to the research frontier that you can consider trying to solve open problems. In contrast, in statistics, very little background is needed, not just to formulate open problems but also to acquire many of the tools needed to study them. I’m thinking here of problems such as how to include large numbers of interactions in a model. Much of the progress made by statisticians and computer scientists on this problem has been made in the context of particular applications.

Going through some great names of the past:

Laplace: possibly hedgehog-like in developing probability theory but I think of him as foxlike in working on various social-statistics applications such as surveys, that gave him the motivation needed to develop practical Bayesian methods.

Gauss: least-squares is a great achievement, but developed as a particular mathematical tool to solve some measurement error problems. In the context of his career, his statistical work is foxlike.

Galton: could be called a “hedgehog” for his obsession with regression, but I think of him as a fox with all his little examples.

Fisher: fox. Developed methods as needed. Developed theory as appropriate (or often inappropriate).

Pearson: the family of distributions smells like a hedgehog, but what’s left of it, includng chi-squared tests, looks like fox tracks.

Neyman: perhaps wanted to be a hedgehog but ultimately a fox, in that he made contributions to different problems of estimation and testing. I’d say the same of Wald and the other mid-century theorists: they might have wanted to be hedgehogs but there was no “theory of relativity” out there for them to discover, so they seem like foxes to me.

I can't really argue with as framed here. Gelman seems to be using a slightly different definition than the standard know-about-many-things/know-about-few-things, but the idea of foxes coming up with advances to respond to different applications in different fields. Still, there is a certain irony in describing Gauss as one who was "interested in everything, and move[d] easily from one problem to another." In terms of ability, this is undeniably true, but in another sense you could argue that few have ever sacrificed more to specialize.

Perhaps the ultimate fox was Leibniz who, in addition to that whole calculus thing...

made major contributions to physics and technology, and anticipated notions that surfaced much later in philosophy, probability theory, biology, medicine, geology, psychology, linguistics, and computer science. He wrote works on philosophy, politics, law, ethics, theology, history, and philology. Leibniz's contributions to this vast array of subjects were scattered in various learned journals, in tens of thousands of letters, and in unpublished manuscripts. He wrote in several languages, but primarily in Latin, French, and German. There is no complete gathering of the writings of Leibniz.

Gauss very probably could have given him a run for his money because (and this is the amazing part) he was roughly as gifted with language as he was with math. He appears to have been conversant in well over a dozen. Gauss was almost twenty (and, among other things, already had least squares under his belt) before he finally decided he should leave philology as a hobby and focus on the sciences. I can't find a reference for this but I seem to recall that he explicitly said he didn't want to repeat Leibniz's mistake and allow himself to be distracted by tackling too many subjects.

The standard response here is to chuckle and say it looks like he made the right choice, but I'm not so sure. People who actually know what they're talking about might disagree, but I wonder how long it would have taken to fill the hole that would have been left if Gauss had diverted a few years of serious thinking from mathematics to linguistics. Not to say that any of his mathematical work was unimportant, but, with the notable exception of number theory where he still casts a long and very distinct shadow, wouldn't most of the things we call Gaussian have still arrived, albeit a few years later (and at the same time in those cases where researchers came up with the same ideas later but published them first)?

By comparison, think about what someone like Gauss might have done with five or ten years of serious linguistic research. The results could have disappointed but just think of the potential.

I'm way out of my depth here, so I'm just hoping to raise some points for discussion (and not say anything stupid in the process). There is, however, one point I would like to make. Hedgehog/fox conversations get complicated quickly and can shift radically when you change the way you frame a question. As a mathematician and a physicist, Gauss was certainly a fox, but when you look at the constraints he put on himself -- choosing not to do any work in any area where he had such incredible aptitude -- he certainly did his best to be a hedgehog.

Monday, April 14, 2014

Zombie Alert

Dean Starkman had an interesting piece in the New Republic on the financial crisis. At least, it held my interest until I came across one one those things that annoy the hell out of me.

This attitude also has a literary pedigree. Cultural theorizing about our inherent weakness goes back to the Bible (see Genesis 2:4-3:24, Adam / Eve), but it was Scottish journalist Charles Mackay who most famously dissected the specific phenomenon of contagious folly in his 1841 classic, Extraordinary Popular Delusions and the Madness of Crowds. Mackay’s work chronicled episodes of mass hysteria—witch hunts, the crusades, alchemy. But most famously, Mackay gave us the parable of the Dutch tulip mania of 1636 to 1637, when flower bulbs briefly became one of the world’s most expensive commodities. A wry and witty stylist, Mackay mixes anecdotes—like one about a sailor who mistook a priceless tulip bulb for an onion to go with his herring breakfast—with mordant observations about human nature. “In reading the history of nations, we find that, like individuals, they have their whims and their peculiarities; their seasons of excitement and recklessness, when they care not what they do,” Mackay’s preface to the 1852 edition begins. “We find that whole communities suddenly fix their minds upon one object, and go mad in its pursuit.”

Mackay has attracted plenty of support from academics over the decades, particularly scholars of social psychology, and in the years since the crash, his work has been much cited as a master theory of what went wrong. People are greedy. What can you do?

There’s just one problem: The accounts that undergird Mackay’s thesis might be wrong. As Andrew Odlyzko, a University of Minnesota mathematician who studies financial panics, puts it, Madness “enjoys extraordinarily high renown in the financial industry and among the press and the public. It also has an extraordinarily low reputation among historians.” Peter Garber, a Brown economist, found in a 1990 paper that the most intense speculation in the Dutch tulip market of the era involved only the rarest bulbs, which had been infected by a certain virus that produced particularly intricate patterns in the flower. After that, the market behaved pretty much the way the market for rare bulbs always behaves. Prices for newly cultivated bulbs were high, then fell over time. In fact, the average decline in prices for the rarest bulbs in the five years after the tulip market crashed was, at most, 32 percent. “Large, but hardly the stuff that legends are made of,” Garber writes.

There is at least major one problem with Starkman's "just one problem." The accounts that undergird Mackay’s thesis about market bubbles are only tangentially related to tulipmania. Extraordinary Popular Delusions does discuss a couple of market bubbles in some detail, but those are both land bubbles. The first concerned France's Mississippi Company; the second focused on the South Sea Company in Britain . After spending about seventy pages on these crises, Mackay closes with a brief and relatively light seven pages on the market for tulip bulbs.

Not only does tulipmania play a relatively trivial role in Mackay's discussion of markets; speculative bubbles play a secondary role in his short discussion of the tulip market. The main focus here is on collector psychology and the tendency to highly value the rare and fragile simply because it's rare and fragile. Here's the opening of the chapter:

The tulip,--so named, it is said, from a Turkish word, signifying a turban,--was introduced into western Europe about the middle of the sixteenth century. Conrad Gesner, who claims the merit of having brought it into repute,--little dreaming of the commotion it was shortly afterwards to make in the world,--says that he first saw it in the year 1559, in a garden at Augsburg, belonging to the learned Counsellor Herwart, a man very famous in his day for his collection of rare exotics.

The bulbs were sent to this gentleman by a friend at Constantinople, where the flower had long been a favourite. In the course of ten or eleven years after this period, tulips were much sought after by the wealthy, especially in Holland and Germany. Rich people at Amsterdam sent for the bulbs direct to Constantinople, and paid the most extravagant prices for them. The first roots planted in England were brought from Vienna in 1600. Until the year 1634 the tulip annually increased in reputation, until it was deemed a proof of bad taste in any man of fortune to be without a collection of them. Many learned men, including Pompeius de Angelis and the celebrated Lipsius of Leyden, the author of the treatise "De Constantia," were passionately fond of tulips. The rage for possessing them soon caught the middle classes of society, and merchants and shopkeepers, even of moderate means, began to vie with each other in the rarity of these flowers and the preposterous prices they paid for them. A trader at Harlaem was known to pay one-half of his fortune for a single root, not with the design of selling it again at a profit, but to keep in his own conservatory for the admiration of his acquaintance.

One would suppose that there must have been some great virtue in this flower to have made it so valuable in the eyes of so prudent a people as the Dutch; but it has neither the beauty nor the perfume of the rose--hardly the beauty of the "sweet, sweet-pea;" neither is it as enduring as either. Cowley, it is true, is loud in its praise. He says--

"The tulip next appeared, all over gay,
But wanton, full of pride, and full of play;
The world can't shew a dye but here has place;
Nay, by new mixtures, she can change her face;
Purple and gold are both beneath her care,
The richest needlework she loves to wear;
Her only study is to please the eye,
And to outshine the rest in finery."

This, though not very poetical, is the description of a poet. Beckmann, in his _History of Inventions_, paints it with more fidelity, and in prose more pleasing than Cowley's poetry. He says, "There are few plants which acquire, through accident, weakness, or disease, so many variegations as the tulip. When uncultivated, and in its natural state, it is almost of one colour, has large leaves, and an extraordinarily long stem. When it has been weakened by cultivation, it becomes more agreeable in the eyes of the florist. The petals are then paler, smaller, and more diversified in hue; and the leaves acquire a softer green colour. Thus this masterpiece of culture, the more beautiful it turns, grows so much the weaker, so that, with the greatest skill and most careful attention, it can scarcely be transplanted, or even kept alive."

Many persons grow insensibly attached to that which gives them a great deal of trouble, as a mother often loves her sick and ever-ailing child better than her more healthy offspring. Upon the same principle we must account for the unmerited encomia lavished upon these fragile blossoms.

This is followed by a discussion of the extreme run-up of prices usually referred to as tulipmania in which speculation plays a prominent but perhaps not central part (it's worth noting that even Garber says that there was at least a brief period intense speculation that extended to common bulb prices), but Mackay keeps coming back to the desire to possess these flowers rather than to resell them for a profit. To drive home the point, Mackay closes with a paragraph on the continued high prices bulbs can fetch. "In England, in our day, strange as it may appear, a tulip will produce more money than an oak. If one could be found, _rara in terris_, and black as the black swan of Juvenal, its price would equal that of a dozen acres of standing corn."

Starkman seems to feel that Mackay's moralizing about bubbles is a form of blaming the victim. Perhaps he's right, but in order to argue the point he'd have to talk about Mackay's accounts of the Mississippi Scheme or the South Sea Bubble. Instead, we get the almost obligatory tulip reference. It was bad enough when writers were overusing this trivial and not particularly relevant case as an example of a speculative bubble; it's even worse when they use it to deny bubbles' existence.

Tulipmania is another one of those rhetorical zombies we need to kill off for good.

P.S. Though not directly related to the main point of the post, it's worth noting that, while Starkman seems to be accusing Mackay of seeing bubbles that aren't there, one of the two scholars he quotes, Andrew Odlyzko, was actually accusing Mackay of choosing not to see bubbles that were there.

Charles Mackay's book "Extraordinary Popular Delusions and the Madness of Crowds" enjoys extraordinarily high renown in the financial industry and among the press and the public. It also has an extraordinarily low reputation among historians.

This paper argues that Mackay's sins of commission were dwarfed by his sins of omission. He lived through several giant investment manias in Britain, yet he did not discuss them in his books. An investigation of Mackay's newspaper writings shows that he was one of the most ardent cheerleaders for the Railway Mania, the greatest and most destructive of these episodes of extreme investor exuberance.

Mackay's story provides another example of a renowned expert on bubbles who decides that "this time is different." His moves through a sequence of delusions help explain the length and damage of the Railway Mania. He was a free market and technology enthusiast, and faced many issues that are important today, such as government ownership or regulation, interconnection, standardization, structural separation, and analogs to net neutrality. A crushing national debt and high unemployment in an economy pulling out of a deep depression (and in perceived danger of falling into another one) were very important in shaping attitudes towards railway expansion. The analogies and contrasts between Mackay's time and ours are instructive.

Saturday, April 12, 2014

Weekend blogging -- getting VORPal

Ken Levine's blog is one of the go-to references for those interested in the business, history and art of television. As you can see from this bio, he's ludicrously overqualified to write on the subject.

Named one of the BEST 25 BLOGS OF 2011 by TIME Magazine. Ken Levine is an Emmy winning writer/director/producer/major league baseball announcer. In a career that has spanned over 30 years Ken has worked on MASH, CHEERS, FRASIER, THE SIMPSONS, WINGS, EVERYBODY LOVES RAYMOND, BECKER, DHARMA & GREG, and has co-created his own series including ALMOST PERFECT starring Nancy Travis. He and his partner wrote the feature VOLUNTEERS. Ken has also been the radio/TV play-by-play voice of the Baltimore Orioles, Seattle Mariners, San Diego Padres. and has hosted Dodger Talk on the Dodger Radio Network.

(And by 'worked on,' he usually means 'played pivotal role in the making of.') That last part of the resume means that Levine also has strong and generally well-thought-out opinions on baseball, particularly when it comes to what it takes to make games into good broadcasting.

Statistics have always been a big part of baseball. And a major crutch for announcers who have no imagination and nothing else to fill time with. Now with Sabermetrics and more detailed categories like VORP, DRS, FIP, EQA, WHIP and WAR number crunching has been taken to a whole new level. Not that these new stats aren’t informative and useful, but there is an avalanche of them. Certainly way more than the average baseball fan can process or wants to process.

And now the Houston Astros have mandated that these analytics be a prerequisite to their broadcasts. I feel especially sorry for their longtime TV announcer, Bill Brown. He’s a terrific play-by-play man. But now saddled with this emphasis on modern-day stats and a bad team, this was the rating for the Astros’ telecast last Monday against the Los Angeles Angels: 0.0. Let me repeat that number. 0.0. And this isn’t the end of the season when the team is mathematically eliminated. It’s their first homestand. How is that even possible? (And it wasn't the first time.)

Yeah, WHIP and WAR really save the day.
...
Statistics are fine in key game situations. Especially if the games have import. Playoff games, for example. Ninth innings. Pennant races. They can enhance a big moment. But breaking down a batter’s average against a certain pitcher when he’s had only six at bats against him and it’s the second inning of a game in mid April – who gives a shit?

Why cater your broadcast to the diehard fans? A) There are not that many of them. B) They’ll listen no matter what you do. C) You chase away casual fans. Women (50.8% of the American population), in particular, tend not to care about Wins Above Replacements.

Thursday, April 10, 2014

538 and Vox

Kaiser Fung made a comment in this thread:

LIke Andrew, I also have been thinking about this, and I come out on the side of Nate. Individually, the critique stands but taken together, they don't call for any coherent vision of how his critics would run an operation such as his. The level of rigor that Krugman and others demand requires years, perhaps decades, of research to write one piece; meanwhile, the other critique is the content is not timely. Think about the full-time journalists he has hired - there isn't a way to pay them enough to do the kind of pieces that are being imagined. As we all know, data collection, cleaning and analysis take a huge amount of time. It may be months of work to get one article out. Further, I'd like to judge them relative to competitors rather than in some kind of abstract universe. Compared to the Freakonomics blog, for example, 538 has a much better orientation. Compare to Huffington Post - when did HP have any real data journalism? Compare to Buzzfeed, don't even want to talk about it.

Now, this is Joseph and not Mark. My view was that you simply cannot judge a publication until there has been six months or so to let things settle. I suspect a lot of the criticism was driven by the climate change article -- and it is interesting to see that this is where people's passions are the highest.

Other columnists, like Emily Oster, are much more subtle cases. I was very dismissive of Emily after her first foray into public health. Her second has seen a lot of criticism as well, but what is different is that the current round is based on careful weighing of evidence and very subtle issues of interpretation (and this was only for a single, small piece of a much larger work). She is getting a lot better.

And that is part of where I am optimistic about Nate Silver. He is doing something really hard and it remains to be seen if the criticism slowly improves matter.

In a lot of ways, the other new information based news source (Vox) has the exact opposite problem. They spent a huge amount of time trying to make some really good pieces (like this one) and grab some of the people I used to read elsewhere (even obscure ones like this). But it will be interesting to see if they can keep up the kick-off level of quality over time.

So I guess the really good news is that we are spoiled for choice with new, information rich, media start-ups. It's hard to see how this is a bad thing.

Wednesday, April 9, 2014

The Hedgehog who thought he was a fox -- a cautionary tale

The growing chorus of Nate Silver fans critical of (or at least perplexed by) the new Five Thirty Eight have caught a great deal of media coverage, mainly for the wrong reasons. Conservatives have painted it as a case of liberals turning on one of their own. Pundits have tried to use the recent critiques of Silver to undercut his earlier, completely unrelated critiques of them (I'm debating whether or not to write a post on Dylan Byers' laughable misreading of Krugman's position. On one hand, it's bad enough to support a post. On the other hand, I'm busy, Charles Pierce already did a good job with it and I'm pretty sure that most people already know what Byers is).

There has been some good work on the subject. Jonathan Chait does a sharp analysis of Krugman's and Silver's personalities and how they shaped the conflict (best line: "Somewhere, David Brooks is reading Silver’s argument that Paul Krugman refuses to attack his colleagues and laughing bitterly."), but other (for me) more interesting issues have gotten less coverage than they merit, things like the culture of statistics, the often depressing career paths promising thinkers take these days* and the dangers of a bad analogy.

It sometimes seems that there's a convention that once a debate has been framed, that framework must be respected, no matter how badly it holds up. Case in point, the fox and the hedgehog. Here's how Silver puts it:

Our logo depicts a fox (we call him Fox No. 92) as an allusion to a phrase originally attributed to the Greek poet Archilochus: “The fox knows many things, but the hedgehog knows one big thing.” We take a pluralistic approach and we hope to contribute to your understanding of the news in a variety of ways.

This is doubly flawed analogy. Expertise is a spiky, complicated thing that doesn't lend itself to scalar measures, let alone binomial. Any attempt to assign people positions on the fox/hedgehog spectrum will be problematic at best with order shifting radically when weighting schemes change. If we do, however, decide to view the world through this framework, we immediately come to an even bigger objection to in Silver's arguments:

Nate Silver is a hedgehog.

There is nothing pejorative about this classification. Silver has done brilliant work. It's just that almost all of Silver's best work has been done using a small but powerful set of analytic tools to address thorny but structurally similar problems in sports and politics. In terms of methods, Silver is a specialist; in terms of data, he's a micro-specialist. Silver has an enormous body of knowledge about working with player stats or polling data, but most of that knowledge is completely field specific.

There's nothing wrong with this kind of specialization -- its absolutely necessary for the kind of results Silver produced -- but it can cause problems when researchers move out of their areas of expertise and fail to adjust for the change. In other words, the trouble starts when hedgehogs think they're foxes.

Being a fox means living with the constant fear that you've just done something stupid that will be immediately obvious to anyone knowledgeable in the field. Ideally that fear leads to a heightened feel for danger levels. Most experienced foxes have developed an instinct for when to seek out a hedgehog. As a corollary, a good fox is always (and I do mean ALWAYS) more willing to ask a stupid question than to make a stupid statement.

For a case study of what can go wrong when experts leave their area of expertise and don't adjust their caution levels, you don't have to look any farther than Silver's attempt to cover the climate change debate. Michael E. Mann assesses the damage:

And so I was rather crestfallen earlier this summer when I finally got a peek at a review copy of The Signal and the Noise: Why So Many Predictions Fail -- but Some Don't. It's not that Nate revealed himself to be a climate change denier; He accepts that human-caused climate change is real, and that it represents a challenge and potential threat. But he falls victim to a fallacy that has become all too common among those who view the issue through the prism of economics rather than science. Nate conflates problems of prediction in the realm of human behavior -- where there are no fundamental governing 'laws' and any "predictions" are potentially laden with subjective and untestable assumptions -- with problems such as climate change, which are governed by laws of physics, like the greenhouse effect, that are true whether or not you choose to believe them.

...

Unlike Levitt, Nate did talk to the scientists (I know. I'm one of them!). But he didn't listen quite as carefully as he should have. When it came to areas like climate change well outside his own expertise, he to some extent fell into the same "one trick pony" trap that was the downfall of Levitt (and arguably others like Malcolm Gladwell in The Tipping Point). That is, he repeatedly invokes the alluring, but fundamentally unsound, principle that simple ideas about forecasting and prediction from one field, like economics, can readily be appropriated and applied to completely different fields, without a solid grounding in the principles, assumptions, and methods of those fields. It just doesn't work that way (though Nate, to his credit, does at least allude to that in his discussion of Armstrong's evaluation of climate forecasts).

I'm singling out Silver here not because he's a bad statistician but because he's a very good one who fell into the increasingly common traps of believing that the world outside of his specialty is simpler and that, if you understand the math you automatically understand the problem. Each field is complex and , like Tolstoy's families, complex in its own way. If you want to have something useful to say in an unfamiliar area of research, knowing the statistics may be necessary but it is far from sufficient.

* On a related note you can find my thoughts on Five Thirty Eight's business model here.

Tuesday, April 8, 2014

The moment it became obvious that Nate Silver wasn't really listening to his critics

Paul Krugman closes a post criticizing the new 538 with the following:

What would be really bad is if this turns into a Freakonomics-type exercise, all contrarianism without any appreciation for the importance of actual expertise. And Michael Mann reminds me that Nate’s book already had some disturbing tendencies in that direction.

In response...

Silver, for his part, said he doesn't shun the negative reviews that FiveThirtyEight has drawn in its infancy, telling TPM that much of the criticism will help the site improve. He just doesn't think Krugman's assessment has been on the mark.

"His comment about experts was particularly strange given that (i) we publish lots of articles by experts, e.g. academic economists like Emily Oster and political scientists like Dan Hopkins and that (ii) Krugman has himself been very critical of Very Serious People and experts in economics and other fields," Silver wrote in the email.

For the record, if someone accuses you of having analysts bungee jump into areas they know nothing about then crank out a bunch of contrarian findings, the fact that you just hired Emily Oster should not feature prominently in your defense.

Monday, April 7, 2014

Imagine an all-curling channel...

Picture yourself a network executive in charge of product development. A producer approaches you with a new US cable channel based on the sport of curling. The producer supports his presentation with various graphs showing that:

Current awareness of the sport is high given its size and has trended steadily up in the period measured;

The sport is currently receiving considerable free publicity, particularly in the references and clips on late night talk shows and other sought-after spots;

Those most likely to be aware of the sport tend to be young with attractive demographics;

Sports is more resilient to competition from the internet;

The programming is incredibly cheap. Many of the leading figures in the sport have literally offered to work for beer.

As an executive, you might be suspicious of these facts (which, after all, I did just make up), but I'll bet you have another, much stronger objection, namely that Americans are aware of curling for about five weeks every four years. You can't base a network on this kind of few-and-far-between spikes in viewership. In order to make this concept workable, it will some attention-grabbing non-seasonal programming, perhaps Extreme Curling or Celebrity Curling.

This takes us to the other major quadrennial media event, the presidential elections and to Nate Silver. If you're talking about horse-race political analysis, there is no bigger star than Silver and no one who deserves his or her fame more. If you make a list of people who really understand the science of polls and elections and another list of journalists with great media savvy and extremely high profiles, you'll get a lot off names on both lists, but if you look at the intersection, you're basically down to one name.

All of this made Silver a big journalistic star every election season. When he was part of the NYT, this worked out great. Every four years he brought in a huge amount of traffic (and presumably digital subscriptions) while the rest of the time he gave the paper analytic credibility. It was a win for Silver, a win for the paper and a win for the readers.

That trickle... trickle... trickle... FLOOD model won't work for the new 538. Despite the relationship with ESPN and ABC, Silver is now pursuing more of a freestanding model like Freakonomics or even the Huffington Post. Like our hypothetical curling channel, he also needs attention-grabbing non-seasonal programming, in this case, counter-intuitive stories by controversial writers who are good at bringing in traffic in part because people like to pick away at their errors.

There are at least a couple of problems with this approach: first, this is a horribly crowded field and the chances of success are not high; second, there's a significant reputational risk in being associated with these controversial writers and, given the extraordinary reputation Silver has worked so hard to build up, this is a risk he may come to regret taking.

Friday, April 4, 2014

"N.R.G. Pick-Ups are PURE DEXTROSE" -- junk food health claims through the years

I previously posted some heath advice from a 1950 comic book which Joseph pointed out was actually pretty good. By comparison, the nutritional information that appeared in this issue of the very early (1937) comic Star Ranger is considerably more questionable. Part of that might be attributable to being more thana decade older, but I suspect the more significant difference is that the later comic was presenting the advice as something of a PSA while I suspect the Curtiss Candy Co might have had another agenda.

Thursday, April 3, 2014

Degrees of separation – class and capital

This is still in rough form (though I've been kicking it around for a while), but I thought it might be interesting to think about inequality/social mobility/access to capital in terms of networks, specifically degrees of separation and Milgram's small world experiment.

New media has expanded our social networks, but it has also created the illusion of even larger ones (often Facebook friends and Linked In connections would be considered complete strangers by any reasonable standard). In order to keep our networks roughly analogous to Milgram's, a connection is defined as someone who knows you by name and with whom you have had multiple one-on-one private exchanges either face-to-face or through some other medium.

You have zero degrees of separation from your self. (This point will be important when discussing capital. In other words you have no separation from your own money.) you have one degree of separation from someone who you have had repeated one-on-one contact with. I'd also suggest excluding employer/employee connections, at least when talking about class and capital. These relationships tend to be highly constrained and should, at the very least be analyzed separately.

With this groundwork laid, I'd like to propose the following, at least as a thought experiment. The original Milgram study looked at the degrees of separation between people who lived Omaha and people who lived in Boston. What if, instead of geographic distance (which arguably means less than it once did), we looked at economic distance (which arguably now means more)?

As before, randomly selected subjects will be asked to connect with strangers and the path length would be measured. Unlike the Milgram study, though, the corresponding pairs of subjects would live in the same geographic area. In this experiment, subjects will be assigned targets so that some are trying to contact subjects in their own income bracket, some are trying to contact subjects in brackets lower than theirs and some are trying to contact people in brackets higher.

Obviously, I don't know if the data will back me up on any of this but here are a few speculations and possible implications:

Though we can't go back in time to gather the data to confirm this, there is both statistical and anecdotal evidence that the correlation between economic distance and degrees of separation is getting stronger;

There seems to be a high inverse correlation between degrees of separation from capital and probability of getting a business funded. This relationship appears to be particularly strong for really bad business plans. I've noticed that when I do a little research into one of those what-were-they-thinking ideas, I always find at least one founder with a low degree of separation from someone with a large amount of capital;

One implication of the above would be that ventures (even bad ones) from people who attended Ivy League schools are far more likely to find funding. I realize this will strike most as a blinding flash of the obvious, but hopefully bringing graph theory tools in will uncover something interesting;

Increasing degrees of separation might also help explain the apparent rise of let-them-eat-cake journalism. We previously discussed a number of major stories such as the SAT and over-the-air television where the standard narrative is written, not just from an upper class perspective, but seemingly under the impression that no other perspective exists. Perhaps journalists who write for major publications are less likely to know people in other economic classes.

A big caveat here. path length is a useful but very limited metric for discussing graphs. I think it would be useful to look at degrees of separation but I suspect the main thing it would accomplish would be to raise more questions.

Wednesday, April 2, 2014

Causal inference is hard

From Slate we have this interesting debate about what ended China's famines:

Scholars continue to argue over how much of China’s agricultural turnaround was due to the capitalist incentive structure, how much resulted from earlier investments, and how much was a trick of the weather. Some say the end of collective farming accounted for nearly three-quarters of the improvements in productivity, while others say it was responsible for no more than one-third.

It’s fine to treat China’s food revolution as a fairy tale. The changes were so dramatic that it’s hard not to. But let’s make sure we get the moral of this story correct. Changing the incentives isn’t a magic trick that can turn any lagging economy into a global juggernaut. Investment in infrastructure, research and development, and putting money into the pockets of workers work wonders as well. And a little sunshine doesn’t hurt, either.

So we basically have five possible explanations, all of which could explain some or all of this change:

Ending collective farming (capitalist reform)
Infrastructure development
Government subsidies to farmers (i.e. financial support to poor people)
Research on improved crops
Unexpected good weather

What makes this tough is that many of these explanations suggest different policy conclusions when you try and apply these lessons to other contexts. For example, if the dominant cause was improved infrastructure then maybe we should tax more in order invest in infrastructure projects. If it was giving more money to poor people then maybe the minimum wage is where we should put our focus. If it was the weather (luck) then maybe these results can't be generalized.

Since complex phenomenon, like improved food supply, like have many causes, it can be hard to decide which ones to focus on. After all, some of these factors could have been counter-productive, but the next causal effect could be positive.

But it seems pretty obvious why experiments are not sensible here. These sorts of questions are, and I think always will be, very hard to answer.