West Coast Stat Views (on Observational Epidemiology and more)

Wednesday, December 21, 2011

Intellectual property: the story that never ends

More on intellectual property rights from Matt Yglesias, who is worried about Google patenting basic features of driverless cars:

If you look at the cars we have, they're all of course different but they have a lot of really profound similarities. You almost always turn a key in the ignition. You have your gas pedal and your break, and you push them both with your right foot. You steer them with a wheel. There's a spedometer and a fuel indicator in more-or-less the same place. They use mirrors so you can see where you're going without constantly turning your head. Would it be a better world if for twenty years someone had held a patent on a Using Mirrors To Allow Drivers To See Behind Them Without Turning Their Head? I say, no. Absent the inability of new entrants into the automobile market to copy some of the basic concepts of what a usable car looks like, we would have had much less competition and much less innovation around the real cutting edge of the automobile industry.

This was not the most interesting thing that was on Moneybox today, but it fit really well into an an evolving theme that we have been seeing recently about how the patent industry is formalizing rent-seeking. This cannot be good in the long run.

Now, it is true that I think that the driverless car is an over-rated concept. Like the jetpack, it is a neat idea that has a lot of very difficult implementation issues. In the case of the driverless car, the main issues, in my opinion, are rethinking the complex web of liability we have constructed around vehicles and smoothly integrating them into mixed use roadways.

The risk of bicycle commuting has been an extremely favorable development, despite the occasional tension between cars and bikers. But I wonder if driverless cars will be able to handle treating cyclists as other vehicles or might the smaller profile of the bike make it harder for the car to account for them? The same concerns come up with pedestrians, especially in large cities.

Still more adventures in intellectual property

From the LA Times (comment would be superfluous):

The patent war between Apple Inc. and smartphone rival Samsung Electronics continues to escalate, and there's only one way to describe the latest vicious salvo:
:)

That's right, it appears that Samsung has initiated a lawsuit against Apple governing the company's use of emoticons.

According to a report from patent observer Florian Mueller, who has been dependably covering the worldwide patent wrestling match between Apple and Android manufacturers, one of four new patent lawsuits filed by Samsung in German court is over, once again, yes, emoticons.

Believe it or not, Samsung does indeed own a patent on smartphone use of emoticons. It won the European rights to that "technology" in 2000, and interested readers can see the actual patent here.

A few more thoughts on journalistic conformity

I complained in an earlier post that journalists have recently shown an alarming tendency to converge on a small set of standard stories when covering a major topic -- small sets that more often than not leave out things that we readers really ought to know about. Here's another example.

There are at least two potentially serious consequences to the amount of carbon we've been pumping into the atmosphere. The first is global warming. The second is the chemical and biological changes in the oceans.

Though it's difficult to compare the likely impact of phenomena this big and complex, the second problem is arguably on a level with the first, a point driven home in the LA Times' Pulitzer-winning series on the subject:

As industrial activity pumps massive amounts of carbon dioxide into the environment, more of the gas is being absorbed by the oceans. As a result, seawater is becoming more acidic, and a variety of sea creatures await the same dismal fate as Fabry's pteropods.

The greenhouse gas, best known for accumulating in the atmosphere and heating the planet, is entering the ocean at a rate of nearly 1 million tons per hour — 10 times the natural rate.

Scientists report that the seas are more acidic today than they have been in at least 650,000 years. At the current rate of increase, ocean acidity is expected, by the end of this century, to be 2 1/2 times what it was before the Industrial Revolution began 200 years ago. Such a change would devastate many species of fish and other animals that have thrived in chemically stable seawater for millions of years.

Less likely to be harmed are algae, bacteria and other primitive forms of life that are already proliferating at the expense of fish, marine mammals and corals.

In a matter of decades, the world's remaining coral reefs could be too brittle to withstand pounding waves. Shells could become too fragile to protect their occupants. By the end of the century, much of the polar ocean is expected to be as acidified as the water that did such damage to the pteropods aboard the Discoverer.

Some marine biologists predict that altered acid levels will disrupt fisheries by melting away the bottom rungs of the food chain — tiny planktonic plants and animals that provide the basic nutrition for all living things in the sea.

And we haven't even gotten to the primeval toxic slime (you really do need to read the whole series).

Given their common origin, comparable severity and potential for synergistic effects, topics like acidification should show up frequently in stories about global warming. Not all the time, but I would expect to see it in at least fifteen or twenty percent of the stories. It is simply a pairing that journalists to make on a fairly regular basis, but while a search of the last twelve months of the New York Times for "climate change" produces 10,509 hits, a search on '"climate change" acidification' over the same period produces 15.

(If we do a quick, back-of-the-envelope hypothesis test on the null that most journalists are well-informed, hard-working, independent thinkers...)

The specific tragedy here is that, for all the ink that's been spilled on the impacts of carbon emissions, all we really get in the vast majority of cases are simply the same handful of stories endlessly recycled. We read dozens of articles but since the writers have converged on a tiny number of narratives we remain ill-informed.

The general tragedy is that this is the way almost all journalism works these days. Through a lack of independent thinking (often augmented by laziness and a lack of rigor), journalists quickly settle on a small number of templates which they seldom stray from, even though these templates leave out important aspect of the larger story. Stories on the environmental impacts of carbon leave out the oceans; stories on the economics of cable don't mention broadcast television; stories about the free spending ways of countries like Greece and Spain omit the fact that Spain was running a surplus before the crisis.

It would be easy to find more examples. Finding counter-examples is the tough one.

Tuesday, December 20, 2011

Prediction is difficult

There is a really thoughtful post in the Economist. The gist:

In a nutshell: I've become far less confident about our ability to accurately describe possible outcomes more than a decade out. Correspondingly, I've become increasingly sceptical of the value of analyses of decisions now that attempt to assess the costs and benefits of action over horizons any longer than a decade.

I think that this was a very good complement to yesterday's discussion of inference from observational medical research. Models are hard. The more complicated the model is, the more likely something is to go wrong. Future predictions suffer from these sorts of complications -- we honestly do not know what the circumstances will be like in the future or how many unlikely events will actually happen. Over the short run, predictions can bank on it being unlikely that a lot of "low event rate but high impact" events will happen. We can also neglect the slow (but incremental variables) that are currently unnoticed but which will make a huge difference in the future.

In the same sense, looking at low event rate outcomes in incomplete data (most of pharmacovigilence), leads to a lot of innate uncertainty. In both cases, I think it makes a lot of sense to be humble about what our models can tell us and to focus on policy that accepts that there is a lot of innate uncertainty in some forms of prediction.

Hat-tip: Marginal Revolutions

Monday, December 19, 2011

Can we do observational medical research?

Andrew Gelman has a really nice post on observational medical research. How could I not respond?

In the post he quotes David Madigan who has a fairly strong opinion on the matter:

I’ve been involved in a large-scale drug safety signal detection project for the last two or three years (http://omop.fnih.org). We have shown empirically that for any given safety issue, by judicious choice of observational database (we looked at 10 big ones), method (we looked at about a dozen), and method setup, you can get *any* answer you want – big positive and highly significant RR or big negative and highly significant RR and everything in between. Generally I don’t think there is any way to say definitively that any one of these analysis is a priori obviously stupid (although “experts” will happily concoct an attack on any approach that does not produce the result they like!). The medical journals are full of conflicting analyses and I’ve come to the belief that, at least in the medical arena, the idea human experts *know* the *right* analysis for a particular estimand is false.

This seems overly harsh to me. Dr. Madigan (who I think is an amazing statistician) is working with OMAP, which I recall as being comprised of data sets of fairly low quality data (prescriptions claims for Medicare/MedicAid, GPRD and other clinical databases, and these sorts of databases). It is a necessary evil to get the power to detect rare (but serious) adverse drug outcomes. But these databases are often problematic when extended beyond extremely clear signal detection issues.

The clearest example of high quality medical data is likely to be randomized controlled double-blinded clinical trials. But there is a whole layer of data between these two extremes of data quality (prospective cohort studies, for example) that has also generated a lot of important findings in medicine.

Sure, it is true that the prospective cohort studies tend to be underpowered to detect rare adverse drug side effects (for precisely the same reason that RCTs are). But there is a lot of interesting observational medical research that does not generate conflicting results or where the experts really seem to have a good grasp on the problem. The links between serum cholesterol levels and cardiovascular events, for example, seems relatively solid and widely replicated. So do the links between smoking and lung cancer (or cardiovascular disease) in North American and European populations. There is a lot that we can learn with observational work.

So I would be careful to generalize to all of medical research.

That being said, I have a great deal of frustration with medical database research for a lot of the same reasons as David Madigan does. I think the issues with trying to do research in medical claims data would be an excellent series of posts as the topic is way too broad for a single post.

Sunday, December 18, 2011

How to read Megan McArdle part 46 -- the quotes

Via Joseph via Karl Smith, Megan McArdle (in a post entitled, "When it Comes to Taxes on the Poor, the Supply Siders are Right") quotes the following passage from Jeff Liebman:

"Despite the EITC and child credit, the poverty trap is still very much a reality in the U.S. A woman called me out of the blue last week and told me her self-sufficiency counselor had suggested she get in touch with me. She had moved from a $25,000 a year job to a $35,000 a year job, and suddenly she couldn't make ends meet any more. I told her I didn't know what I could do for her, but agreed to meet with her. She showed me all her pay stubs etc. She really did come out behind by several hundred dollars a month. She lost free health insurance and instead had to pay $230 a month for her employer-provided health insurance. Her rent associated with her section 8 voucher went up by 30% of the income gain (which is the rule). She lost the ($280 a month) subsidized child care voucher she had for after-school care for her child. She lost around $1600 a year of the EITC. She paid payroll tax on the additional income. Finally, the new job was in Boston, and she lived in a suburb. So now she has $300 a month of additional gas and parking charges. She asked me if she should go back to earning $25,000. I told her that she should first try to find a $35k job closer to home. Also, she apparently can't fully reverse her decision to take the higher paying job because she can't get the child care voucher back (the waiting list is several years long she thinks). She is really stuck. She tried taking an additional weekend job, but the combination of losing 30 percent in increased rent and paying for someone to take care of her child meant it didn't help much either.

The question is what is the policy solution here. Means-tested transfers have to be phased out at some point, so there is no easy answer.

Notice strangely brief second paragraph and the missing quotation mark at the end? Statisticians tend to be suspicious people, particularly when it comes to odd cut-offs for data ranges so I clicked through the link to the Jeff Frankels post that provided the original quote and saw a possible reason why McArdle had stopped so abruptly. Here's the very next sentence:

I think there are three things we might be able to do — all of which would, as you say, be a better use of revenue than tax cuts for the rich.

The whole paragraph is worth reading:

The question is what is the policy solution here. Means-tested transfers have to be phased out at some point, so there is no easy answer. I think there are three things we might be able to do — all of which would, as you say, be a better use of revenue than tax cuts for the rich. First, make child-related tax benefits equal for all families (now they are high at the bottom because of the EITC and high at the top because the dependent exemption is more valuable the higher the tax bracket you are in, and the dip in the middle raises marginal tax rates by 21 percent for a family with two kids — so eliminating the dip would get rid of this 21 percent portion of the effective marginal tax rate). David Ellwood and I analyze this first idea. Also Sawicky and Cherry have put forth a similar idea. Second, in designing universal health insurance, we need to be very careful not to phase out income-related premium subsidies over the same income range where all of these other benefits are being phased out. Third, implement a delay between income increases and rent increases in section 8 — allow people to save up a bit before they are hit with the rent increase (I believe I read that some states have been trying out something like this recently, but I am not up to date on these policies). There are some excellent papers that carefully model how the cumulative effects of the welfare system create a poverty trap. But I don’t think either of these papers includes all of the factors facing the woman above — so they would probably indicate that she faced a 60 percent marginal tax rate rather than the 130% (or whatever it really is) rate that she actually faces.”

I not entirely convinced that Liebman has made the case for counting personal expenses required for a new job as a tax increase, but it's a coherent and honest argument that's certainly persuasive on the reality of a poverty trap.

As for the rest of McArdle's post, after having spent a great deal of time arguing that a situation exists where supply siders predict a strong effect, her whole defense of her central thesis consists of this:

Note two things: first, that in this case, at least, the supply siders seem to be completely right. Everyone I've spoken to about the problem seems to agree that the poor respond to these high marginal tax rates by either taking lower-paying jobs than they could, or working less--not in every individual case, but in aggregate.

And second, that this is not a problem that supply siders seem to be applying much brain power or political capital to fixing.

The everyone-I've-spoken-to standard leaves something to be desired, particularly given the fact that the woman in the anecdote did the exact opposite of what supply side theory predicted; rather than "taking lower-paying jobs than [she] could, or working less," she "tried taking an additional weekend job."

(and to put way too fine a point on this, I don't see the predicted big dip for affected families in these numbers either)

On the bright side, this still isn't the worst thing to come out of the Atlantic recently.

Saturday, December 17, 2011

When a model simply doesn't match reality

Karl Smith relates a story from Megan McArdle:

A woman called me out of the blue last week and told me her self-sufficiency counselor had suggested she get in touch with me. She had moved from a $25,000 a year job to a $35,000 a year job, and suddenly she couldn't make ends meet any more. I told her I didn't know what I could do for her, but agreed to meet with her. She showed me all her pay stubs etc. She really did come out behind by several hundred dollars a month. She lost free health insurance and instead had to pay $230 a month for her employer-provided health insurance. Her rent associated with her section 8 voucher went up by 30% of the income gain (which is the rule). She lost the ($280 a month) subsidized child care voucher she had for after-school care for her child. She lost around $1600 a year of the EITC. She paid payroll tax on the additional income. Finally, the new job was in Boston, and she lived in a suburb. So now she has $300 a month of additional gas and parking charges. She asked me if she should go back to earning $25,000. I told her that she should first try to find a $35k job closer to home. Also, she apparently can't fully reverse her decision to take the higher paying job because she can't get the child care voucher back (the waiting list is several years long she thinks). She is really stuck. She tried taking an additional weekend job, but the combination of losing 30 percent in increased rent and paying for someone to take care of her child meant it didn't help much either.

Ms, McArdle tries to make a supply side argument here, where she points out that we are failing to create policies to incentive work among the poor (who can suffer a marginal tax rate of > 100% in many circumstances). It is a really interesting question why we focus on the top marginal tax rate and not the marginal tax rate for people in lower income brackets (where there is less of a competition effect). However, Karl Smith notices the really interesting behavioral issue here:

She faced a marginal tax rate in excess of 100%. This meant as her earned income went up she got poorer. What did she do? She tried to earn even more income. It was only we she failed at the attempt to make ends meet by supplying ever more labor to the free market that she try to go back to making less money.

So, not only do we have evidence from Matt Yglesias and Felix Salmon that top income earners don't necessarily even know their marginal rate, but we see low income people (facing a > 100% marginal rate of taxes) desperately trying to get more income and not less. It is not the case that the woman in this heartbreaking story decides that she would prefer to spend more time in leisure (so we can't intice her into working more). It is that working actually costs her money.

And her response is to get a second job!

Is it really too late to put supply side economics into the "special circumstances only" bin and leave it there? It may influence the odd movie producer, consultant, or freelancer (who have the ability to take on work in discrete projects and who have income sufficiency already). But this conceptual model seems to be absolutely dreadful at making predictions about how real people will act in most employment situations.

Thursday, December 15, 2011

A really nice article by Andrew Gelman and Kaiser Fung

Andrew Gelman and Kaiser Fung have an article on Freakonomics in American Scientist. My favorite part was the story of Emily Oster and her theory of Hepatitis B:

Monica Das Gupta is a World Bank researcher who, along with others in her field, has attributed the abnormally high ratio of boy-to-girl births in Asian countries to a preference for sons, which manifests in selective abortion and, possibly, infanticide. As a graduate student in economics, Emily Oster (now a professor at the University of Chicago) attacked this conventional wisdom. In an essay in Slate, Dubner and Levitt praised Oster and her study, which was published in the Journal of Political Economy during Levitt’s tenure as editor:
[Oster] measured the incidence of hepatitis B in the populations of China, India, Pakistan, Egypt, Bangladesh, and other countries where mothers gave birth to an unnaturally high number of boys. Sure enough, the regions with the most hepatitis B were the regions with the most “missing” women. Except the women weren’t really missing at all, for they had never been born.
Oster’s work stirred debate for a few years in the epidemiological literature, but eventually she admitted that the subject-matter experts had been right all along. One of Das Gupta’s many convincing counterpoints was a graph showing that in Taiwan, the ratio of boys to girls was near the natural rate for first and second babies (106:100) but not for third babies (112:100); this pattern held up with or without hepatitis B. In a follow-up blog post, Levitt applauded Oster for bravery in admitting her mistake, but he never credited Das Gupta for her superior work. Our point is not that Das Gupta had to be right and Oster wrong, but that Levitt and Dubner, in their celebration of economics and economists, suspended their critical thinking.

I think that this story actually has two elements. One is the dangers of a convincing explanation. There are a lot of associations that can appear and would be extremely exciting if they were true. Just consider the recent article on statins reducing mortality due to pneumonia: it is an amazing result that would be extremely exciting if it were true. I worry that these kinds of exciting results get a lot of press instead of being seen a signposts towards needing to examine the problem more carefully. After all, it was a good thing that Das Gupta had a chance to look at her data and control for an additional predictive variable. What is concerning is not raising the idea -- it is the strength of the language: "Except the women weren’t really missing at all, for they had never been born" which implied a lot more certainty than seemed warranted. But putting that point aside, the real interesting thing (to me) is considering likely effect sizes. When you look at the population level infection rates (incremental on the infection rates in countries without this gender imbalance) then you quickly conclude that the effect of infection has to be high. After all, the rate in India appears to be about 3% (versus less than 1% in the United States). At the same time, the sex ratio in India was 1.10 (these are approximate numbers). So if the natural sex ratio is 105 and India has 110 we can do a calculation. Assume that the Hep B rate among reproductive age women is triple the population average (say 9%). So 0.91 x 105 + 0.09 x [RATE] = 110. That suggests that the sex ratio among infected women is 160 (it gets a lot worse if you merely assume double). That means we could prove this hypothesis by following a very small cohort of Hep B infected pregnant women, since the effect size is so large. Now this is a simplistic way to look at the problem, and I am sure that more nuanced approaches make sense. But isn't this the sort of data you'd look for before suggesting that the experts completely missed the explanatory variable? After all, you are positing an enormous effect size for the influence of the virus on sex ratios. This would be observed in routine clinical practice. So, not to give anybody a hard time. We all have challenges in our research and it is really hard to tackle these types of problems. People should have credit for putting their necks out and proposing testable hypotheses that can enhance our understanding of the world. But I think we should rethink just how certain we are when we make these proposals. Maybe we need to learn to say "this is a possible explanation for some of the observed variation".

Wednesday, December 14, 2011

Remember, it no longer counts as plagiarism unless you use exactly the same words

I had started out to write a post about the cable executive who described the rising cost of ESPN as a "tax on every American household," but when I went looking for source articles I noticed something strange. The Atlantic, the Wall Street Journal, and the rest all told the story in the same way, right down to the same omission of the role of over-the-air television (which is particularly relevant to a story about threats to cable's business model).

Don't worry, this is not another rabbit-ears story. What's significant here is that in a story about cable losing cost-conscious customers, none of the writers mentioned the tens of millions of people who were getting full digital television for free. Not only do journalists now tend to cover the stories from the same angles, they even omit the same important details.

This group-think is bad enough on its own, but when you combine it with an increasingly nonchalant attitude toward accuracy and fact-checking (here's one of many examples), the results can be dangerous.

Look at the debate over the Euro-crisis. Any number of virtually identical stories have appeared claiming that the crisis was started by the wild deficit spending of southern countries, implicitly or explicitly including Spain, despite the fact that Spain had been running a surplus before the crisis.

If journalists aren't bothering to think independently or check their facts when reporting on the Euro-crisis, what stories are important enough to justify their A-game?

K12

This is not surprising:

Nearly 60 percent of its students are behind grade level in math. Nearly 50 percent trail in reading. A third do not graduate on time. And hundreds of children, from kindergartners to seniors, withdraw within months after they enroll.

By Wall Street standards, though, Agora is a remarkable success that has helped enrich K12 Inc., the publicly traded company that manages the school. And the entire enterprise is paid for by taxpayers.

Now, we've long been test score critics at OE. So I will accept the argument that test scores should not necessarily be the most important feature of a school. But if they are the motivation for shifting to private education then I'd at least like to see reasonable scores (after all, this is the reason for the existence of these options).

Nor is the fact that the schools are focusing on aggressive expansion reassuring:

Despite lower operating costs, the online companies collect nearly as much taxpayer money in some states as brick-and-mortar charter schools. In Pennsylvania, about 30,000 students are enrolled in online schools at an average cost of about $10,000 per student. The state auditor general, Jack Wagner, said that is double or more what it costs the companies to educate those children online.

“It’s extremely unfair for the taxpayer to be paying for additional expenses, such as advertising,” Mr. Wagner said. Much of the public money also goes toward lobbying state officials, an activity that Ronald J. Packard, chief executive of K12, has called a “core competency” of the company.

I think that it is concerning that a core competency of a large (and growing) private school is that it focuses on lobbying governments for money. If the main issue that we have with traditional public education is rent-seeking by teachers, how much worse is rent seeking by a corporation? After all, if teachers gain a small surplus per teacher that at least has a broad social impact. Clearly K12 has managed to avoid expensive teachers:

But online schools have negligible building costs and cheaper labor costs, partly because they pay teachers low wages, records and interviews show. Parents, called “learning coaches,” do much of the teaching, prompting critics to argue that states are essentially subsidizing home schooling.

At what point is the school simply letting the parent home school their children and accepting educational grant money for the purpose? This is a model that, I suspect, has a chance if and only if you have a stay at home parent that focuses on working with the child on education (or if sleep is an activity that you engage in only on weekends).

Now, I do not want to be a luddite. There may be a role for online education and this particular NY Times piece may not capture all of the nuances of K12 (the articles about traditional schools often has this issue as well). But this sort of business model has long been one of my major concerns about the push towards privatization of schools.

Smart comments from Matt Yglesias and Dana Goldstein are also worth reading.

Sunday, December 11, 2011

Question about Airlines

Mark Thoma tweets:

MarkThoma Mark Thoma
Just once, I'd like to be able to get on the plane at the scheduled time, and make all my connections. Once doesn't seem to much to ask. Grr

I cannot agree more. Why is it so difficult for modern airlines to provide basic services? Why is the city bus more likely to be on schedule than Delta Airlines?

Saturday, December 10, 2011

This week's appalling story of intellectual property abuse

Brought to us by Alex Tabarrok:

Prometheus gave man fire, thankfully he didn’t charge every time man lit a match. Prometheus Labs in contrast wants to charge patients for a rule that says when to increase or decrease a drug in response to a blood test. Quoting Tim Lee:
The patent does not cover the drug itself—that patent expired years ago—nor does it cover any specific machine or procedure for measuring the metabolite level. Rather, it covers the idea that particular levels of the chemical “indicate a need” to raise or lower the drug dosage.

Even this is not quite right for suppose a physician notes that the patient’s metabolites are within the range where a change in dosage is not necessary; although the physician takes no action she still has used the patent and thus must pay Prometheus Lab a fee or infringe.

Friday, December 9, 2011

Maybe the Republican primary is going just as we should expect

I don't mean that in a snarky way. This is a completely non-snide post. I was just thinking about how even a quick little model with a few fairly intuitive assumptions can fit seemingly chaotic data surprisingly well. This probably won't look much like the models political scientists use (they have expertise and real data and reputations to protect). I'm just playing around.

But it can be a useful thought experiment, trying to explain all of the major data points with one fairly simple theory. Compare that to this bit of analysis from Amity Shlaes:

The answer is that this election cycle is different. Voters want someone for president who is ready to sit down and rewrite Social Security in January 2013. And move on to Medicare repair the next month. A policy technician already familiar with the difference between defined benefits and premium supports before he gets to Washington. What voters remember about Newt was that some of his work laid the ground for balancing the budget. He was leaving the speaker's job by the time that happened, but that experience was key.

This theory might explain Gingrich's recent rise but it does a poor job with Bachmann and Perry and an absolutely terrible job with Cain. It's an explanation that covers a fraction of the data. Unfortunately, it's no worse than much of the analysis we've been seeing from professional political reporters and commentators.

Surely we can do better than that.

Let's say that voters assign their support based on which candidate gets the highest score on a formula that looks something like this (assume each term has a coefficient and that those coefficients vary from voter to voter):

Score = Desirability + Electability(Desirability)

Where desirability is how much you would like to see that candidate as president and electability is roughly analogous to the candidate's perceived likelihood of making it through the primary and the general election.

Now let's make a few relatively defensible assumptions about electability:

electability is more or less a zero sum game;

it is also something like Keynes' beauty contest, an iterative process with everyone trying to figure out who everyone else is going to pick and throwing their support to the leading acceptable candidate;

desirability tends to be more stable than electability.

I almost added a third assumption that electability has momentum, but I think that follows from the iterative aspect.

What can we expect given these assumptions?

For starters, there are two candidates who should post very stable poll numbers though for very different reasons: Romney and Paul. Romney has consistently been seen as number one in general electability so GOP voters who find him acceptable will tend strongly to list him as their first choice even if they may not consider him the most desirable. While Romney's support comes mostly from the second term, Paul's comes almost entirely from the first. Virtually no one sees Paul as the most electable candidate in the field, but his supporters really, really like him.

It's with the rest, though, that the properties of the model start to do some interesting things. Since the most electable candidate is not acceptable to a large segment of the party faithful, perhaps even a majority, a great deal of support is going to go to the number two slot. If there were a clear ranking with a strong second place, this would not be a big deal, but this is a weak field with a relatively small spread in general electability. The result is a primary that's unstable and susceptible to noise.

Think about it this way: let's say the top non-Romney has a twelve percent perceived chance of getting to the White House, the second has eleven and the third has ten. Any number of trivial things can cause a three point shift which can easily cause first and third to exchange places. Suddenly the candidate who was polling at seven is breaking thirty and the pundits are scrambling to come up with an explanation that doesn't sound quite so much like guessing.

What the zero property and convergence can't explain, momentum does a pretty good job with. Take Perry. He came in at the last minute, seemingly had the election sewn up then dropped like a stone. Conventional wisdom usually ascribes this to bad debate performances and an unpopular stand on immigration but primary voters are traditionally pretty forgiving toward bad debates (remember Bush's Dean Acheson moment?) and most of the people who strongly disagreed with Perry's immigration stand already knew about it.

How about this for another explanation? Like most late entries, Perry was a Rorschach candidate and like most late entries, as the blanks were filled in Perry's standing dropped. The result was a downward momentum which Perry accelerated with a series of small but badly timed missteps. Viewed in this context, the immigration statement takes on an entirely different significance. It didn't have to lower Perry's desirability in order to hurt him in the polls; instead, it could have hurt his perceived electability by reminding people who weren't following immigration that closely that Perry had taken positions that other Republicans would object to.

Of course, showing how a model might possibly explain something doesn't prove anything, but it can make for an interesting thought experiment and it does, I hope, at least make a few points, like:

1. Sometimes a simple model can account for some complex and chaotic behavior;

2. Model structure matters. D + ED gives completely different results than D + E;

3. Things like momentum, zero sum constraints, convergence, and shifting to and from ordinal data can have some surprising implications, particularly when;

4. Your data hits some new extreme.

[For a look at what a real analysis of what's driving the poll numbers, you know where to go.]

Thursday, December 8, 2011

Model assumptions

Felix Salmon and Matt Yglesias:

The entire debate in congress over taxes is that President Obama wants to restore the top marginal rate to the level that Dimon thinks it already is. Meanwhile, Dimon doesn’t even know what tax rate he pays.

I think that this quote is really, really important. Classical economic models presume that individuals act to maximize their utility. But real people often have limitations, including lack of perfect information about what costs really are. I would be surprised if Mark did not have follow-up thoughts.

But the key point is that if these assumptions about informed persons can't hold for the CEO of JP Morgan Chase (whom you would assume is numerate) then how likely is that these models are going to be good at prediction? After all, we presume Jamie Dimon is maximizing his utility for a 39.6% marginal tax rate; so a change in taxes to what he currently thinks that they already are would alter his incentives how?

Wednesday, December 7, 2011

Quote of the day

However, one view of pure research is that it is research that has not yet found application; pure research is a long-term investment just as applied research is a short-term investment.

M.F. Goodchild
"Geographical Information Science"