Comments, observations and thoughts from two bloggers on applied statistics, higher education and epidemiology. Joseph is an associate professor. Mark is a professional statistician and former math teacher.
As reported on This American Life (get a download here. It's well worth the money), back in 2002, recently retired multi-millioniare Steve Poizner volunteered to teach a class in east San Jose. A few years later he wrote about that year in the book Mount Pleasant: My Journey from Creating a Billion-Dollar Company to Teaching at a Struggling Public High School. It's an interesting account (though not in any of the ways Poizner intended).
One detail struck me as particularly funny as I listened to the story while driving through LA. Here's the excerpt as Poizner read it on the show:
Several miles and a couple of highways later I took the Capital Expressway exit and drove into what felt like another planet. Signs advertising janitorial supply stores and taquerÃas. Exhaust hung over 10 lanes of inner city traffic; yellowing, weedy gardens fronted many of the homes, as did driveways marred by large oil spots or broken down cars.
Taquerias are generally small restaurants that serve mainly tacos and burritos. After hamburger joints, they are probably the most common restaurants in California. The thought that a California resident who wasn't looking for a place to eat would even notice them is odd. Listing them as an exotic sign of urban decay is downright bizarre.
Andrew Gelman has a great post on how statisticans do not always apply the lessons of statistics to their everyday lives. And it is true -- a lot of what we do in our lives is from custom and tradition that has never undergone rigorous evalaution. I'd be curious to see how teaching would change if we tested pre- and post-class knowledge (and had a control group who did not take the class). One suspects that the results might be humbling . . .
This reminds me of Miquel Hernan arguing in the journal Epidemiology (sadly, gated) that epidemiologists should be skeptical of journal impact factors because of our training. Unfortunately, despite this skepticism, it is rare for an active researcher not to have a good idea of the ranking of various journals (often based off of the impact factor). I think that staticisticans are better at disregarding impact factors but that's not a universal rule.
But I wonder if we could improve a lot of everyday tasks with a slightly tigher focus on statistical and/or epidemiological reasoning?
This is either the best piece of California political reporting I've come across recently or the best piece of education reporting. Either way it's worth the ninety-nine cents.
I've been enjoying this online discussion (particularly the contributions of Felix Salmon) and I'm sure Joseph and I will have more posts on the subject but first I have something I have to get off my chest:
No one can possibly know what's going on here! We can get some smart people making good guesses about long term stock performance, but these guesses are based on data from a century's worth of secular upheavals. A list that includes the Great Depression, two world wars, a post-war reconstruction, the cold war, China becoming a major player, boomers entering the market, boomers leaving the market and huge changes in regulation, technology and business practices.
What's happening now never happened before. What happened before never happened before. What happened before what happened before never happened before. We have no precedents. People are recommending forty-year investment strategies using models based on data from markets that haven't gone twenty years without a major secular change. . There. I've got it out of my system. Go on with what you were doing
We've been speaking about investment and stock market returns recently as an example of how forecasting is difficult. But I would be remiss not to mention Vanguard and it's very articulate founder. I've just finished reading his latest book (Enough) and it was very worth a couple of hours of time. A very thoughtful reflection on character and ethics.
I'm also hoping to emulate him in another way -- he is 91 years old and still active in what he loves. Here is hoping I'm still doing Epidemiology at the same age!
Candid Engineer points out something that I think all of us find as a challenge -- actually getting work out in the form of papers. I like to solve puzzles and understand the world around me. It's what drives me as a scientist.
On the other hand, it's a minority of results that are fun to write up and the publication process is always painful. So I, too, have to watch the tendency to learn the results and then move on. This is specially true if you take on a high risk project and the results are a bit disappointing.
My most recently submitted paper was analyzed 2 years ago and it took a lot of effort to clean it up until it could be submitted. Not a fun process.
But sticking with it is crucial. It's the only way to really let others know what you have done and really push science forward!
That sounds sarcastic but it's not. We really are blessed with an amazing crop of writers who can discuss business and economics with insight, encyclopedic knowledge and surprisingly good prose.
Here a couple of the best get us up to speed on financial reform: Preventative Measures
Mark points out, in a comment on my post on financial literacy, the issue of the equity premium as being discussed by Felix Salmon. I think that this point is really interesting and deserves more attention than a breif comment.
This article from the Economist is another great discussion of the complexities of making this claculation. The author lays out the basic problem with calculating an equity premium (or lack thereof) using current data (i.e. recent rates of return). One does not want to price in the current economic crisis (unless one thinks that these crises will be more common in the future) but neither does one want to calculate an equity premium that ignores either key events or actual real rates of return.
When you add in the issue of limited data (we only have a couple of hundred years of annual stock returns for the United States and only about 60-80 years are really relevant to the current market) and the risk of a secular shift (what if one or more financial innovations has fundementally changed the nature of the market) then these complications are almost enough to make the problem intractable. One might argue that these factors need to balance out in the long run (if equities don't pay a premium for risk then people will stop holding equities). But it is remarkable how long the long run can be; I think John Maynard Keynes said it best with:
The long run is a misleading guide to current affairs. In the long run we are all dead. Economists set themselves too easy, too useless a task if in tempestuous seasons they can only tell us that when the storm is past the ocean is flat again.
I think that this problem is true for epidemiological forecasts as well. The forecast of influenza rates (as a recent example) depends critically on the assumption that the current strains of influenze are not fundementally different than past strains. Often this assumption is reasonable but it can miss the most important changes (like the arrival of a new and more lethal version of the virus).
Whether it is disease rates or stock markets, it is not a simple matter to use the past as a guide to the future. There is no doubt that forecasting is hard but it's also true that it is important to do it as well as possible. If I ever figure out the trick I will be sure to share it!
Talking Points Memo has been doing a superb job covering the races in Kentucky and Pennsylvania. Josh Marshall's posts on Rand Paul have been particularly insightful. You really ought to take a look.
That has nothing to do with the main point of this post but I feel guilty hammering away at TPM and I'm afraid it's time to pick up the old nine-pounder again and start swinging at this:
Whitman's support has fallen in large part due to Democratic attacks over her connections to Goldman Sachs -- the Dems would prefer to face Poizner in the fall. There may also have been a backlash against her big personal spending on the race, which has reached $68 million so far, and a tightly controlled media operation in which she has avoided directly answering questions from reporters about the issues -- a fact that is frequently noted in media reports.
As statisticians, we are constantly being asked why something happened. We don't like the question (the subject of causality makes us nervous) but it's not something we can avoid so we approach it as rationally as we can. We look at correlations, of course, but we also look at timing, magnitudes, precedents. We consider the implications of the different hypotheses (for example, if bad weather caused a shift in behavior, that shift should be limited to certain regions). We seek out the opinions of informed sources while taking pains not to get sucked into conventional wisdom. We survey the situation on the ground and use that most important of statistical tools, common sense.
I'd like to say that the final step is testing the hypothesis, but that's not usually the way it works. When it comes to questions of causality, the final answer is usually just an educated guess. Fortunately, most of us have gotten to be pretty good guessers.
How does the Goldman Sachs hypothesis stand up to this (admittedly unscientific) approach?
For starters, the timing is all wrong.
(I'm assuming these are mostly likely-voter polls)
The Goldman Sachs-Whitman connection came out in February. The Abacus scandal hit big in mid-April. Poizner's Vulture ad was released at the end of April. Which of these would you associate with an inflection point in Whitman's support?
By comparison, the news for much of April had been dominated by Arizona's immigration law (which passed their house on April 13th) and since March, Poizner had been running ads attacking Whitman as being soft on immigration.
How about magnitude and precedent? Whitman took a fifty point lead down into the single digits. Immigration and RINOism are huge issues for the California GOP. Either could easily kill a campaign. The Goldman Sachs story, on the other hand, is abstract, relatively minor, and only tangentially related to the issues California conservatives care about.
And has anyone EVER burned off forty-plus points of lead because of this kind of business deal?
And finally, do any of the major players, the ones with access to internal polling, actually believe this hypothesis? Poizner clearly doesn't or he'd be making more than passing references to Goldman Sachs. Meg clearly doesn't or she wouldn't be spending her time insisting that she supports neither amnesty nor Senator Boxer. Hell, even the CDP doesn't or they wouldn't pick Peter Coyote to pitch their case. (I'm not saying that the CDP ads are ineffective; I'm saying they are targeted at the general election.)
To make this even less scientific, here's my good ol' boy take on what happened. It's the old story of an out-of-towner walking into a bar, hearing a couple of locals bragging and believing every word. ("You mean you really took down a $70 million dollar campaign?" "Yep, and we did it with just one little website.")
I suppose there's no harm in a little boasting (and it's not something that Democratic operatives get to do that often in California). I just hope that the people at TPM are a bit more worldly the next time the situation comes around.
And to close out the subject of nine pound hammers (and get our minds on something more pleasant than politics), here's a mental health break from a friend of mine:
Update: Ed Kilgore has a good analysis of the race here. I think he gives too much weight to the Goldman Sachs story but I may still just be stuck in argument mode as a reaction to TPM.
Olivia Judson has a new piece on archaea, those strange life forms that can adapt to the harshest conditions on earth. I've read a number of other articles on the subject but most of them had an empty calorie quality, heavy on the gee-whiz and light on the substance. But Judson focuses on the good stuff. How do archaea differ from other life? What do we know or suspect about their evolution? Why have they been so difficult to study?
This is not an isolated case. I have often looked at the topic of one of Judson's pieces and thought "Not that again," but I always felt I owed her an apology by the time I got to the end.
I know that spousal hiring is a contentious topic and, like many complicated transactions, it is hard to carefully audit all elements of it. I am sure that situations happen that are not ideal. That being said, I do love it when I see a heart-warming story of it all working out and everyone (University included) being better off.
Today, readers of Talking Points Memo (at least those in California) saw the following ad:
The election is (as previously mentioned here) a compound game requiring two consecutive wins. There are certain Pyrrhic strategies which maximize the chances of a win in the first part and minimize chances of a win in the second. (at the risk of putting too fine a point on it, immigration is the obvious Pyrrhic strategy in this game.)
Let's say W and P are players from the same party with an equal chance of winning the first part of the game. If they can agree not to go Pyrrhic, the winner of the first part has a good chance of winning the second. If one player goes Pyrrhic, then that player has a much better chance of winning the first part and a somewhat worse chance of winning the second, overall still a net improvement. If both go Pyrrhic, neither gains an advantage in the first part and both take a hit in the second.
Of course, in real life, we have all sorts of mechanisms like reputation and institutions such as party leaderships to help us avoid the prisoner's dilemma. That's not to say that it can't happen but we are able to discourage it.
Now consider this scenario. W is way ahead of P but is more vulnerable to the Pyrrhic strategy. Here P is almost certain to go Pyrrhic even though it will hurt his chances of winning the second part.
This suggests an interesting counter-intuitive strategy for W: open up a lead slowly and don't get too far ahead. (In other words, the opposite of the aforementioned blitzkrieg strategy.) There are external costs to going Pyrrhic, loss of reputation, long term damage to the party, pressure and disapproval from peers. W could hope to find that sweet spot where those costs slightly outweigh the benefits to P of a Pyrrhic/Pyrrhic contest.
I don't know if this would have worked for W (counter-intuitive strategies usually don't), but I have to think it would be better than what she came up with.
(note to the the picky: I'm used an exceptionally watered-down form of the word Pyrrhic here, but if you can't get sloppy in a blog, where can you get sloppy?)
Monday night on the Big Bang Theory, Sheldon Cooper (Jim Parsons) lovingly cooed to his laptop that Ubuntu was his favorite Linux-based operating system. It was one of those pitch-perfect geek-chic allusions that the show specializes in, a reference to a piece of software beloved by computer nerds and all but unheard of among the surface dwellers.
I remember thinking that I had never heard anyone mention Ubuntu on television (I'm not even sure I've heard anyone talk about Linux), but I didn't give much thought to the underlying economics until the next day when I saw a piece from Forbes on what they termed 'rip-offs,' products and services with cheaper alternatives. It concluded with a section on basic cable:
The Rip-Off: All you want is basic cable, but your cable company wants you to have so much more--and pay through the nose for it. That's why it bundles in a whole mess of channels, including dozens that even the most feckless of couch potatoes won't watch.
How to Avoid It: Hulu.com offers thousands of videos, TV episodes (new and old) and full-length movies--all free. And Netflix charges as little as $9 a month for access to more than 100,000 TV episodes on DVD, as well as 12,000 movies.
I could sympathize with complaints about cable -- I had dropped it a few years earlier because they kept moving my favorite channels to more expensive tiers (TCM was the last straw) -- but I was surprised what the article didn't list.
I pay around $250 a year for pretty good high speed cable. Netflix starts at $60 a year and that limits you to two DVDs a month (I don't have an account). But the cheapest option, by far, is digital broadcasting (DTT). I bought a $45 dollar converter ($5 after coupon) and hooked it to rabbit ears I picked up at a 99 cents store. I now get over fifty channels with better quality than I used to get with cable, good enough to burn DVDs off the shows I record.
Ubuntu and DTT are orphans; they have no representation in markets where all their competitors have advocates. In theory, journalists are supposed to fill in the gap but that seldom works. As seen here, reporters generally limit their approach to repackaging the information and arguments generated by the advocates of the products they're writing about.
I use a lot of orphan products, mainly because I am very, very cheap and orphans tend to be great bargains. Ubuntu is free. DTT is free once you have a converter or a fairly new TV. Tap water is provided by my landlord. And when I was running a small business I generally found that no one could compete with the postal service on price.
Of course, these bargains indicate market inefficiency. This suggests an analogy to the legal system. We expect the courts to reach fair decisions given the condition that both parties have adequate representation. If we're going to expect market efficiency, perhaps we need to have a similar requirement.
Nutritional epidemiology is a very complex area; eggs are a case in point. Evidence has been emerging that eggs are better for you than we had previously thought.
My SO (who does nutrition) has pointed out that we often look at just one part of a food and forget the other aspects. The result of simplistic recommendations may result in unbalanced diets -- a focus on fruits and vegetables can overlook important elements like adequate dairy intake. It's just hard to reduce to dietary recommendations to sound-bytes. Add in individual differences and the complexity of the problem increases (a lot).
Nutrition is a complex system with dozens of exposures (both macro- and micro-nutrients) that has significant issues with data collection. Food frequency surveys have known limitations and more sophisticated measures have their own limitations. Experiments are also difficult to design given the complexity of nutritional requirements across the lifespan.