Sunday, March 7, 2010

"Algebra in Wonderland" -- recommended with reservations

In today's New York Times, Melanie Bayley, a doctoral candidate in English literature at Oxford, argues that Lewis Carroll's Alice in Wonderland can be interpreted as a satire of mathematics in the mid-Nineteenth Century, particularly the work of Hamilton and De Morgan.

The essay has its share of flaws: none of the analogies are slam-dunk convincing (the claim that the Queen of Hearts represents an irrational number is especially weak); the omission of pertinent works like "A Tangled Tale" and "What the Tortoise Said to Achilles" is a bit strange; and the conclusion that without math, Alice might have been more like Sylvie and Bruno would be easier to take seriously if the latter book hadn't contained significant amounts of mathematics* and intellectual satire.

Those problems aside, it's an interesting piece, a great starting point for discussing mathematics and literature and it will give you an excuse to dig out your Martin Gardner books. Besides, how often do you get to see the word 'quaternion' on the op-ed page?


* including Carroll's ingenious gravity powered train.

Friday, March 5, 2010

When is zero a good approximation

I was commenting on Andrew Gelman's blog when a nice commentator pointed something out that I usually don't think much about: pharmacoepidemiology outcomes include both cost and efficacy.

Now, a lot of my work has been on older drugs (aspirin, warfarin, beta blockers are my three most commonly studied drugs) so I have tended to assume that cost was essentially zero. A years supply of aspirin for $10.00 is an attainable goal and so I have assumed that we can neglect the cost of therapy.

But does that make sense if we are talking a targeted chemotherapy? In such a case, we might have to weight not just the burden of additional adverse events but the cost of the medication itself.

It's becoming appalling clear to me that I don't have a good intuition of how to model this well. Making everything a cost and assuming a price on years of life lost is one approach but the complexity of pricing involved (and the tendency for relative costs to change over time) worried me about external validity.

I know what I will be thinking about this weekend!

Thursday, March 4, 2010

How are genetically engineered crops like AAA rated structured bonds?

Felix Salmon draws a clever analogy:

If you only grow one crop, the downside of losing it all to an outbreak is catastrophe. In rural Iowa it might mean financial ruin; in Niger, it could mean starvation.

Big agriculture companies like DuPont and Archer Daniels Midland (ADM), of course, have an answer to this problem: genetically engineered crops that are resistant to disease. But that answer is the agricultural equivalent of creating triple-A-rated mortgage bonds, fabricated precisely to prevent the problem of credit risk. It doesn’t make the problem go away: It just makes the problem rarer and much more dangerous when it does occur because no one is — or even can be — prepared for such a high-impact, low-probability event.

Valuing Pain

Readers of this blog will know that I have some concerns about the regulation of pain medications. The FDA continues to warn about the issue of liver injury when taking acetaminophen.

For a moment, let's ignore the case of people taking the drug inappropriately or for whom another medication would provide better symptom control. They exist and are relevant to policy discussions, but they distract from today's main thought.

We can measure liver damage and death (hard outcomes). We cannot easily measure pain -- what level of pain relief is worth a 1% chance of death?

So do we leave it up to individual judgment? Drugs can be confusing and acetaminophen (due to efficacy) is included in a lot of preparations (for important reasons). So what is the ideal balance between these two goals (prevent adverse events and relieving pain)?

It would be so much easier if pain were easy to measure . . .

Wednesday, March 3, 2010

p-values

Another nice critique of relying on p-values. There is also a fine example in the comments of why one should double check when they think things look odd. Often it is better to keep one's mouth shut and be thought a fool than to open it and remove all doubt.

Tuesday, March 2, 2010

Comparing Apples and Really Bad Toupees

DISCLAIMER: Though I have worked in some related areas like product launches, I have never done an analysis of brand value. What follows are a few thoughts about branding without any claim of special expertise or insight. If I've gotten something wrong here I would appreciate any notes or corrections.

Joseph's post reminded me of this article in the Wall Street Journal about the dispute between Donald Trump and Carl Icahn over the value of the Trump brand. Trump, not surprisingly, favors the high end:
In court Thursday, Mr. Trump boasted that his brand was recently valued by an outside appraiser at $3 billion.

In an interview Wednesday, Mr. Trump dismissed the idea that financial troubles had tarnished his casino brand. He also dismissed Mr. Icahn's claims that the Trump gaming brand was damaged, pointing to a recent filing in which Mr. Icahn made clear that he wants to assume the license to the brand. "Every building in Atlantic City is in trouble. OK? This isn't unique to Trump," he said. "Everybody wants the brand, including Carl. It's the hottest brand in the country."
While Icahn's estimate is a bit lower:
Mr. Icahn, however, believes his group also would have the right to use the Trump name under an existing licensing deal, but says the success of the casinos don't hinge on that. The main disadvantage to losing the name, he says, would be the $15 million to $20 million cost of changing the casinos' signs.
So we can probably put the value of the Trump brand somewhere in the following range:

-15,000,000 < TRUMP < 3,000,000,000

(the second inequality should be less than or equal to -- not sure how to do it on this text editor)

Neither party here is what you'd call trustworthy and both are clearly pulling the numbers they want out of appropriate places but they are able to make these claims with straight faces partly because of the nature of the problem.

Assigning a value to a brand can be a tricky thing. Let's reduce this to pretty much the simplest possible case and talk about the price differential between your product and a similar house brand. If you make Clorox, we're in pretty good shape. There may be some subtle difference in the quality between your product and, say, the Target store brand but it's probably safe to ignore it and ascribe the extra dollar consumers pay for your product to the effect.

But what about a product like Apple Computers? There's clearly a brand effect at work but in order to measure the price differential we have to decide what products to compare them to. If we simply look at specs the brand effect is huge but Apple users would be quick to argue that they were also paying for high quality, stylish design and friendly interfaces. People certainly pay more for Macs, Ipods, Iphones, and the rest, but how much of that extra money is for features and how much is for brand?

(full disclosure: I use a PC with a dual Vista/Ubuntu operating system. I do my programming [Python, Octave] and analysis [R] in Ubuntu and keep Vista for compatibility issues. I'm very happy with my system. If an Apple user would like equal time we'd be glad to oblige)

I suspect that more products are closer to the Apple end of this spectrum than the Clorox end but even with things like bleach, all we have is a snapshot of a single product. To useful we need to estimate the long term value of the brand. Is it a Zima (assuming Zima was briefly a valuable brand) or is it a Kellogg's Corn Flakes? And we would generally want a brand that could include multiple brands. How do we measure the impact of a brand on products we haven't launched yet? (This last point is particularly relevant for Apple.)

The short answer is you take smart people, give them some precedents and some guidelines then let them make lots of educated guesses and hope they aren't gaming the system to tell you what you want to hear.

It is an extraordinarily easy system to game even with guidelines. In the case of Trump's casinos we have three resorts, each with its own brand that interacts in an unknown and unknowable way with the Trump brand. If you removed Trump's name from these buildings, how would it affect the number of people who visit or the amount they spend?

If we were talking about Holiday Inn or even Harrah's, we could do a pretty good job estimating the effect of changing the name over the door. We would still have to make some assumptions but we would have data to back them up. With Trump, all we would have is assumption-based assumptions. If you take these assumptions about the economy, trends in gambling and luxury spending, the role of Trump's brand and where it's headed, and you give each one of them a small, reasonable, completely defensible nudge in the right direction, it is easy to change your estimates by one or two orders of magnitude.

We also have an unusual, possibly even unique, range of data problem. Many companies have tried to build a brand on a public persona, sometimes quite successfully. Normally a sharp business analyst would be in a good position to estimate the value of one of these brands and answer questions like "if Wayne Gretsky were to remove his name from this winter resort, what impact would it have?"

The trouble with Trump is that almost no one likes him, at least according to his Q score. Most persona-based brands are built upon people who were at some point well-liked and Q score is one of the standard metrics analysts use when looking at those brands. Until we get some start-ups involving John Edwards and Tiger Woods, Mr. Trump may well be outside of the range of our data.

Comparing apples and oranges

Comparing salaries across national borders is a tricky thing to do. I was reminded of this problem while reading a post from Female Science Professor. My experience has been limited to the US and Canada but, even there, it's hard to really contrast these places. When I worked in Montreal, I had easy access to fast public transit, most things in walking distance, inexpensive housing but a much lower salary. In Seattle I have reluctantly concluded that, given my work location, a car was essential.

So how do you compare salaries?

This is actually a general problem in Epidemiology. Socio-economic status is known to be an important predictor of health. But it is tricky to measure. Salary needs to be adjusted for cost of living; hard even when you have good location information (which, in de-identified data you may very well not). Even in large urban areas, costs can be variable depending on location.

Alternatively, there are non-financial rewards (that are status boosting) in many jobs; how do you weight these? Adam Smith noted back in the Wealth of Nations that the a prestigious position was related to lower wages. How do you compare equal salaries between a store clerk and a journalist?

Is a hard problem and I really lack a great solution. But it's worth putting some real thought into!!

Monday, March 1, 2010

"What bankers can learn from arc-welder manufacturers"

Felix Salmon points out the following from a book review from the Wall Street Journal:

Mr. Koller contends that layoffs deprive companies of profit-generating talent and leave the remaining employees distrustful of management—and often eager to find jobs elsewhere ahead of the next layoff round. He cites research showing that, on average, for every employee laid off from a company, five additional ones leave voluntarily within a year. He concludes that the cost of recruiting, hiring and training replacements, in most cases, far outweighs the savings that chief executives assume they're getting when they initiate wholesale firings and plant closings.

Having actually built some of the models that directly or indirectly determined hiring and layoffs, and more importantly having been the one who explained those models to the higher-ups, I very much doubt that most companies spend enough time looking at the hidden and long term costs of layoffs.

The book is Spark, by Frank Koller. Sounds interesting.

Selection Bias with Hazard Ratios

Miguel Hernan has a recetn article on the Hazards of Hazard Ratios. The thing that jumped to my attention was his discussion of "depletion of susceptibles". Any intervention can look protective, eventually, if speeds up disease in the susceptible such that the rate of events in that population eventually drops (as all of the members of the population able to have an event have had it).

I think that this element of hazards ratios illustrates two principles:

1) it always makes sense to begin the analysis of a medication at first use or else you can miss a lot

2) In the long run, we are all dead

So the real trick seems to be more focus on good study design and being careful to formulate problems with precision. Quality study design never goes out of style!

Nate SIlver debunks another polling myth

Here's the old chestnut (from Robert Moran):


In a two way race, political professionals don't even bother to look at the spread between the incumbent and the challenger, they only focus on the incumbent's support relative to 50%. Incumbents tend to get trace elements of the undecideds at the end of a campaign. Sure, there is the occasional exception, but this rule is fairly ironclad in my experience.


Here's Silver's takedown:


There are several noteworthy features of this graph:


1) It is quite common for an incumbent to be polling at under 50 percent in the early polling average; this was true, in fact, of almost half of the races (30 of the 63). An outright majority of incumbents, meanwhile, had at least one early poll in which they were at under 50 percent of the vote.


2) There are lots of races in the top left-hand quadrant of the graph: these are cases in which the incumbent polled at under 50 percent in the early polling average, but wound up with more than 50 percent of the vote in November. In fact, of the 30 races in which the incumbent had less than 50 percent of the vote in the early polls, he wound up with more than 50 percent of the vote 18 times -- a clear majority. In addition, there was one case in which an incumbent polling at under 50 percent wound up with less than 50 percent of the November vote, but won anyway after a small third-party vote was factored in. Overall, 19 of the 30 incumbents to have less than 50 percent of the vote in the early polling average in fact won their election.


3) 5 of the 15 incumbents to have under 45 percent of the vote in early polls also won their elections. These were Bob Menendez (38.9 percent), Tim Palwenty (42.0 percent), Don Carcieri (42.3 percent), Jennifer Granholm (43.4 percent) and Arnold Schwarzenegger (44.3 percent), all in 2006.3b) If we instead look at those cases within three points of Ted Strickland's 44 percent, when the incumbent had between 41 and 47 percent of the vote in early polls, he won on 11 of 17 occasions (65 percent of the time).


4) Almost all of the data points are above the red diagonal line, meaning that the incumbent finished with a larger share of the vote than he had in early polls. This was true on 58 of 63 occasions.


4b) On average, the incumbent added 6.4 percent to his voting total between the early polling average and the election, whereas the challenger added 4.5 percent. Looked at differently, the incumbent actually picked up the majority -- 59 percent -- of the undecided vote vis-a-vis early polls.


4c) The above trend seems quite linear; regardless of the incumbent's initial standing in the early polls, he picked up an average of 6-7 points by the election, although with a significant amount of variance.


5) The following corollary of Moran's hypothesis is almost always true: if an incumbent has 50 percent or more of the vote in early polls, he will win re-election. This was true on 32 of 33 occasions; the lone exception was George Allen in Virginia, who had 51.5 percent of the vote in early polls in 2006 but lost re-election by less than a full point (after running a terrible campaign). It appears that once a voter is willing to express a preference for an incumbent candidate to a pollster, they rarely (although not never) change their minds and vote for the challenger instead.

Saturday, February 27, 2010

Meta-Freakonomics

Joseph recently wrote a post referring to this post by Andrew Gelman (which was based on a series of posts by Kaiser Fung which check the veracity of various claims in Superfreakonomics -- welcome to the convoluted world of the blogosphere). Joseph uses Dr. Gelman's comments about the poor editing and fact-checking of the book to make a point about the disparity between the contribution editing makes and how little we reward it. He ought to know; I have frequently taken advantage of his good nature in this area, but at the risk of being ungrateful, I don't think the point applies here. Rather than being helpful, the kind of criticism Joseph and Gelman describe could only hurt Superfreakonomics.

Or put another way, if we approach this using the techniques and assumptions of the Freakonomics books, we can show that by foregoing a rigorous internal review process the authors were simply acting rationally.

Before we get to the actual argument, we need to address one more point in Joseph's post. Joseph says that providing critical read "is one of the most helpful things a colleague can do for you, yet one of the least rewarded." This statement is absolutely true for easily 99.9% of the books and manuscripts out there. It is not, however, true for the Freakonomics books. Between their prestige and the deep pockets of William Morrow, Levitt and Dubner could have gotten as many highly-qualified internal reviewers as they wanted, reviewers who would have been compensated with both an acknowledgment and a nice check. (Hell, they might even get to be in the movie.)

But if the cost and difficulty of putting together an all-star team of reviewers for Superfreakonomics would have been negligible, how about the benefits? Consider the example of its highly successful predecessor. Freakonomics was so badly vetted that two sections (including the book's centerpiece on abortion) were debunked almost immediately. The source material for the KKK section was so flawed that even Levitt and Dubner disavowed it.

These flaws could have been caught and addressed in the editing process but how would making those corrections help the authors? Do we have any reason to believe that questionable facts and sloppy reasoning cost Levitt and Dubner significant book sales (the book sold over four million copies)? That they endangered the authors' spot with the New York Times? Reduced in any way the pervasive influence the book holds over the next generation of economists? Where would Levitt and Dubner have benefited from a series of tough internal reviews?

Against these elusive benefits we have a number of not-so-hard-to-find costs. While the time and money required to spot flaws is relatively minor, the effort required to address those flaws can be substantial.

Let's look at some specifics. Kaiser Fung raises a number of questions about the statistics in the "sex" chapter (the one about female longevity is particularly damning) and I'm sure he overlooked some -- not because there was anything wrong with his critique but because finding and interpreting reliable data on a century of sex and prostitution is extraordinarily difficult. It involves measurement covert behavior that can be affected by zoning, police procedures, city politics, shifts in organized crime,and countless other factors. Furthermore these same factors can bias the collection of data in nasty and unpredictable ways.

Even if all of the sex chapter's underlying economics arguments were sound (which they are, as far as I know), there would still have been a very good chance that some reviewer might have pointed out flawed data, discredited studies, or turned up findings from more credible sources that undercut the main hypotheses. That doesn't mean that the chapter couldn't be saved -- a good team of researchers with enough time could probably find solid data to support the arguments (assuming, once again, that they were sound) but the final result would be a chapter that would look about the same to the vast majority of readers and external reviewers -- all cost, no benefit.

Worse yet, think about the section on the relative dangers of drunken driving vs. drunken walking. These cute little counter-intuitive analyses are the signature pieces of Levitt and Dubner (and were associated with Dr. Levitt before he formed the team). They are the foundation of the brand. Unfortunately, counter-intuitive analyses tend to be fragile creatures that don't fare that well under scrutiny (intuition has a pretty good track record).

The analysis of modes of drunken transportation would be one of the more fragile ones. Most competent internal reviewers would have had the same reaction that Ezra Klein had:
You can go on and on in this vein. It's terrifically shoddy statistical work. You'd get dinged for this in a college class. But it's in a book written by a celebrated economist and a leading journalist. Moreover, the topic isn't whether people prefer chocolate or vanilla, but whether people should drive drunk. It is shoddy statistical work, in other words, that allows people to conclude that respected authorities believe it is safer for them to drive home drunk than walk home drunk. It's shoddy statistical work that could literally kill somebody. That makes it more than bad statistics. It makes it irresponsible.
Let me be clear. I am not saying that Levitt and Dubner knew there were mistakes here. Quite the opposite. I'm saying they had a highly saleable manuscript ready to go which contained no errors that they knew of, and that any additional checking of the facts, the analyses or logic in the manuscript could only serve to make the book less saleable, to delay its publication or to put the authors in the ugly position of publishing something they knew to be wrong.

Gelman closes his post with this:
It's the nature of interesting-but-true facts that they're most interesting if true, and even more interesting if they're convincingly true.
Perhaps, but Levitt and Dubner have about four million reasons that say he's wrong.

When you really want to argue causality...

There's always a way.

John Quiggin does the dirty work:
I underestimated the speed and power of Zombie ideas. As early as Sep 2009, Casey Mulligan was willing to claim that the entire crisis could be explained in terms of labor market interventions. According to Mulligan, financial markets anticipated a variety of measures from the Obama Administration, observing ‘Arguably, the 2008 election was associated with an increase in the power of unions to shape public policy, and thereby the labor market. Congress has considered various legislation that would raise marginal income tax rates, and would present Americans with new health benefits that would be phased out as a function of income.’

This is truly impressive. So perspicacious are the financial markets, that even the possibility that Congress might raise taxes, or incorporate a means test in health care legislation that might be passed some time in the future (at the time of writing this in Feb 2010, the bill was still tied up) was sufficient to bring down the entire global financial market. And, even though the McCain-Palin ticket was widely seen as having a good chance (at least before the September 2008), the markets didn’t wait for the election returns to come in. Applying some superstrong version of market efficiency, market participants predicted the election outcome, applied Mulligan’s neoclassical model to the predicted policies of the Obama Administration and (perfectly rationally) panicked.

Friday, February 26, 2010

IPTW news

Peter C. Austin in his new article The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies compares a number of different propensity score approaches for modeling risk differences. Curiously, inverse probability of treatment weighting out-performed matching on propensity scores. My intuition is that they would have had similar levels of accuracy and bias.

It's going to be interesting to think about why this result holds.

Neat stuff -- go, read and enjoy!

Thursday, February 25, 2010

Editing

Andrew Gelman makes a great point about editing ; there is nothing that helps more than having somebody do a critical read through a a manuscript to point out where your reasoning is sloppy. This is one of the most helpful things a colleague can do for you, yet one of the least rewarded. It can be painful to hear these comments but it's worth every agonizing moment.

Wednesday, February 24, 2010

Stand and deliver

This article by the gifted Olivia Judson* explores the research about sitting and obesity that Joseph was talking about and makes some interesting suggestions:
Some people have advanced radical solutions to the sitting syndrome: replace your sit-down desk with a stand-up desk, and equip this with a slow treadmill so that you walk while you work. (Talk about pacing the office.) Make sure that your television can only operate if you are pedaling furiously on an exercise bike. Or, watch television in a rocking chair: rocking also takes energy and involves a continuous gentle flexing of the calf muscles. Get rid of your office chair and replace it with a therapy ball: this too uses more muscles, and hence more energy, than a normal chair, because you have to support your back and work to keep balanced. You also have the option of bouncing, if you like.
* and could someone explain to me why the New York Times' best science writer only shows up in the opinion section.