Saturday, March 20, 2010

New Proposed National Math Standards

These actually look pretty good.

Friday, March 19, 2010

Too late for an actual post, but...

There are another couple of entries in the TNR education debate. If you're an early riser you can read them before I do.

Thursday, March 18, 2010

Some more thoughts on p-value

One of the advantages of being a corporate statistician was that generally you not only ran the test; you also explained the statistics. I could tell the department head or VP that a p-value of 0.08 wasn't bad for a preliminary study with a small sample, or that a p-value of 0.04 wasn't that impressive with a controlled study of a thousand customers. I could factor in things like implementation costs and potential returns when looking at type-I and type-II errors. For low implementation/high returns, I might set significance at 0.1. If the situation were reversed, I might set it at 0.01.

Obviously, we can't let everyone set their own rules, but (to coin a phrase) I wonder if in an effort to make things as simple as possible, we haven't actually made them simpler. Statistical significance is an arbitrary, context-sensitive cut-off that we assign before a test based on the relative costs of a false positive and a false negative. It is not a God-given value of 5%.
Letting everyone pick their own definition of significance is a bad idea but so is completely ignoring context. Does it make any sense to demand the same level of p-value from a study of a rare, slow-growing cancer (where five-years is quick and a sample size of 20 is an achievement) and a drug to reduce BP in the moderately obese (where a course of treatment lasts two week and the streets are filled with potential test subjects)? Should we ignore a promising preliminary study because it comes in at 0.06?

For a real-life example, consider the public reaction to the recent statement that we didn't have statistically significant data that the earth had warmed over the past 15 years. This was a small sample and I'm under the impression that the results would have been significant at the 0.1 level, but these points were lost (or discarded) in most of the coverage.

We need to do a better job dealing with these grays. We might try replacing the phrase "statistically significant" with "statistically significant at 10/5/1/0.1%." Or we might look at some sort of a two-tiered system, raising significance to 0.01 for most studies while making room for "provisionally significant" papers where research is badly needed, adequate samples are not available, or the costs of a type-II error are deemed unusually high.

I'm not sure how practical or effective these steps might be but I am sure we can do better. Statisticians know how to deal with gray areas; now we need to work on how we explain them.

For more on the subject, check out Joseph's posts here and here.

The winner's curse

I have heard about the article that Mark references in a previous post; it's hard to be in the epidemiology field and not have heard about it. But, for this post, I want to focus on a single aspect of the problem.

Let's say that you have a rare side effect that requires a large database to find and, even then, the power is limited. Let's say, for an illustration, that the true effect of a drug on an outcome is an Odds Ratio (or Relative Risk, it's a rare disease) of 1.50. If, by chance alone, the estimate in database A is 1.45 (95% Confidence interval: 0.99 to 1.98) and the estimate in database B is 1.55 (95% CI: 1.03 to 2.08) the what would be the result of two studies on this side effect?

Well, if database A is done first then maybe nobody ever looks at database B (these databases are often expensive to use and time consuming to analyze). If database B is used first, the second estimate will be from database A (and thus lower). In fact, there is some chance that the researchers from database A will never publish (as it has been historically the case that null results are hard to publish).

The result? Estimates of association between the drug and the outcome will tend to be biased upwards -- because the initial finding (due to the nature of null results being hard to publish) will tend to be an over-estimate of the true causal effect.

These factors make it hard to determine if a meta-analysis of observational evidence would give an asymptotically unbiased estimate of the "truth" (likely it would be biased upwards).

In that sense, on average, published results are biased to some extent.

A lot to discuss

When you get past the inflammatory opening, this article in Science News is something you should take a look at (via Felix Salmon).
“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

Wednesday, March 17, 2010

Evidence

I was reading Andrew Gelman (always a source of interesting statistical thoughts) and I started thinking about p-values in epidemiology.

Is there a measure in all of medical research more controversial than the p-value? Sometimes I really don't think so. In a lot of ways, it seems to dominate research just because it has become an informal standard. But it felt odd, the one time I did it, to say in a paper that there was no association (p=.0508) when adding a few more cases might have flipped the answer.

I don't think confidence intervals, used in the sense of "does this interval include the null", really advance the issue either. But it's true that we do want a simple way to decide if we should be concerned about a possible adverse association and the medical literature is not well constructed for a complex back and through discussion about statistical models.

I'm also not convinced that any "standard of evidence" would not be similarly misapplied. Any approach that is primarily used by trained statisticians (sensitive to it's limitations) will look good compared with a broad standard that is also applied by non-specialists.

So I guess I don't see an easy way to replace our reliance on p-values in the medical literature, but it is worth some thought.

"We could call them 'universities'"

This bit from the from Kevin Carey's entry into the New Republic Debate caught my eye:

In the end, [Diane Ravitch's] Death and Life is painfully short on non-curricular ideas that might actually improve education for those who need it most. The last few pages contain nothing but generalities: ... "Teachers must be well educated and know their subjects." That's all on page 238. The complete lack of engagement with how to do these things is striking.

If only there were a system of institutions where teachers could go for instruction in their fields. If there were such a system then Dr. Ravitch could say "Teachers must be well educated and know their subjects" and all reasonable people would assume that she meant we should require teachers to take more advanced courses and provide additional compensation for those who exceeded those requirements.

Tuesday, March 16, 2010

Some context on schools and the magic of the markets

One reason emotions run so hot in the current debate is that the always heated controversies of education have somehow become intertwined with sensitive points of economic philosophy. The discussion over child welfare and opportunity has been rewritten as an epic struggle between big government and unions on one hand and markets and entrepreneurs on the other. (insert Lord of the Rings reference here)

When Ben Wildavsky said "Perhaps most striking to me as I read Death and Life was Ravitch’s odd aversion to, even contempt for, market economics and business as they relate to education" he wasn't wasting his time on a minor aspect of the book; he was focusing on the fundamental principle of the debate.

The success or even the applicability of business metrics and mission statements in education is a topic for another post, but the subject does remind me of a presentation the head of the education department gave when I was getting my certification in the late Eighties. He showed us a video of Tom Peter's discussing In Search of Excellence then spent about an hour extolling Peters ideas.

(on a related note, I don't recall any of my education classes mentioning George Polya)

I can't say exactly when but by 1987 business-based approaches were the big thing in education and had been for quite a while, a movement that led to the introduction of charter schools at the end of the decade. And the movement has continued to this day.

In other words, American schools have been trying a free market/business school approach for between twenty-five and thirty years.

I'm not going to say anything here about the success or failure of those efforts, but it is worth putting in context.

Monday, March 15, 2010

And for today, at least, you are not the world's biggest math nerd

From Greg Mankiw:
Fun fact of the day: MIT releases its undergraduate admission decisions at 1:59 pm today. (That is, at 3.14159).

Who is this Thomas Jefferson you keep talking about?

I've got some posts coming up on the role curriculum plays in educational reform. In the meantime, check out what's happening in Texas* with the state board of education. Since the Lone Star state is such a big market they have a history of setting textbook content for the nation.

Here's the change that really caught my eye:
Thomas Jefferson no longer included among writers influencing the nation’s intellectual origins. Jefferson, a deist who helped pioneer the legal theory of the separation of church and state, is not a model founder in the board’s judgment. Among the intellectual forerunners to be highlighted in Jefferson’s place: medieval Catholic philosopher St. Thomas Aquinas, Puritan theologian John Calvin and conservative British law scholar William Blackstone. Heavy emphasis is also to be placed on the founding fathers having been guided by strict Christian beliefs.
* I'm a Texan by birth. I'm allowed to mock.

Observational Research

An interesting critique of observational data by John Cook. I think that the author raises an interesting point but that it is more true of cross-sectional studies than longitudinal ones. If you have a baseline modifiable factor and look at the predictors of change then you have a pretty useful measure of consequence. It might be confounded or it might have issues with indication bias, but it's still a pretty interesting prediction.

With cross sectional studies, on the other hand, reverse causality is always a concern.

Of course, the other trick is that the risk factor really has to be modifiable. Drugs (my own favorite example) often are. But even diet and exercise get tricky to modify when you look at them closely (as they are linked to other characteristics of the individual and are a very drastic change in lifestyle patterns).

It's a hard area and this is is why we use experiments as our gold standard!

"The Obesity-Hunger Paradox"

Interesting article from the New York Times:

WHEN most people think of hunger in America, the images that leap to mind are of ragged toddlers in Appalachia or rail-thin children in dingy apartments reaching for empty bottles of milk.

Once, maybe.

But a recent survey found that the most severe hunger-related problems in the nation are in the South Bronx, long one of the country’s capitals of obesity. Experts say these are not parallel problems persisting in side-by-side neighborhoods, but plagues often seen in the same households, even the same person: the hungriest people in America today, statistically speaking, may well be not sickly skinny, but excessively fat.

Call it the Bronx Paradox.

“Hunger and obesity are often flip sides to the same malnutrition coin,” said Joel Berg, executive director of the New York City Coalition Against Hunger. “Hunger is certainly almost an exclusive symptom of poverty. And extra obesity is one of the symptoms of poverty.”

The Bronx has the city’s highest rate of obesity, with residents facing an estimated 85 percent higher risk of being obese than people in Manhattan, according to Andrew G. Rundle, an epidemiologist at the Mailman School of Public Health at Columbia University.

But the Bronx also faces stubborn hunger problems. According to a survey released in January by the Food Research and Action Center, an antihunger group, nearly 37 percent of residents in the 16th Congressional District, which encompasses the South Bronx, said they lacked money to buy food at some point in the past 12 months. That is more than any other Congressional district in the country and twice the national average, 18.5 percent, in the fourth quarter of 2009.

Such studies present a different way to look at hunger: not starving, but “food insecure,” as the researchers call it (the Department of Agriculture in 2006 stopped using the word “hunger” in its reports). This might mean simply being unable to afford the basics, unable to get to the grocery or unable to find fresh produce among the pizza shops, doughnut stores and fried-everything restaurants of East Fordham Road.

"The economics profession is in crisis"

This may sound strange but all this soul searching by economists like Mark Thoma makes me think that the field might be on the verge of extensive reassessment and major advances.

From the Economist's View:
The fact that the evidence always seems to confirm ideological biases doesn't give much confidence. Even among the economists that I trust to be as fair as they can be -- who simply want the truth whatever it might be (which is most of them) -- there doesn't seem to be anything resembling convergence on this issue. In my most pessimistic moments, I wonder if we will ever make progress, particularly since there seems to be a tendency for the explanation given by those who are most powerful in the profession to stick just because they said it. So long as there is some supporting evidence for their positions, evidence pointing in other directions doesn't seem to matter.

The economics profession is in crisis, more so than the leaders in the profession seem to understand (since change might upset their powerful positions, positions that allow them to control the academic discourse by, say, promoting one area of research or class of models over another, they have little incentive to see this). If, as a profession, we can't come to an evidence based consensus on what caused the single most important economic event in recent memory, then what do we have to offer beyond useless "on the one, on the many other hands" explanations that allow people to pick and choose according to their ideological leanings? We need to do better.

(forgot to block-quote this. sorry about the error)

TNR on the education debate

The New Republic is starting a series on education reform. Given the extraordinary quality of commentary we've been seeing from TNR, this is definitely a good development.

Here are the first three entries:

By Diane Ravitch: The country's love affair with standardized testing and charter schools is ruining American education.

By Ben Wildavsky: Why Diane Ravitch's populist rage against business-minded school reform doesn't make sense.

By Richard Rothstein: Ravitch’s recent ‘conversion’ is actually a return to her core values.

Sunday, March 14, 2010

Harlem Children's Zero Sum Game

I used to work in the marketing side of large corporation (I don't think they'd like me to use their name so let's just say you've heard of it and leave the matter at that). We frequently discussed the dangers of adverse selection: the possibility that a marketing campaign might bring in customers we didn't want, particularly those we couldn't legally refuse. We also spent a lot of time talking about how to maximize the ratio of perceived value to real value.

On a completely unrelated note, here's an interesting article from the New York Times:
Pressed by Charters, Public Schools Try Marketing
By JENNIFER MEDINA

Rafaela Espinal held her first poolside chat last summer, offering cheese, crackers and apple cider to draw people to hear her pitch.

She keeps a handful of brochures in her purse, and also gives a few to her daughter before she leaves for school each morning. She painted signs on the windows of her Chrysler minivan, turning it into a mobile advertisement.

It is all an effort to build awareness for her product, which is not new, but is in need of an image makeover: a public school in Harlem.

As charter schools have grown around the country, both in number and in popularity, public school principals like Ms. Espinal are being forced to compete for bodies or risk having their schools closed. So among their many challenges, some of these principals, who had never given much thought to attracting students, have been spending considerable time toiling over ways to market their schools. They are revamping school logos, encouraging students and teachers to wear T-shirts emblazoned with the new designs. They emphasize their after-school programs as an alternative to the extended days at many charter schools. A few have worked with professional marketing firms to create sophisticated Web sites and blogs.
...

For most schools, the marketing amounts to less than $500, raised by parents and teachers to print up full color postcards or brochures. Typically, principals rely on staff members with a creative bent to draw up whatever they can.

Student recruitment has always been necessary for charter schools, which are privately run but receive public money based on their enrollment, supplemented by whatever private donations they can corral.

The Harlem Success Academy network, run by the former City Council member Eva Moskowitz, is widely regarded, with admiration by some and scorn by others, as having the biggest marketing effort. Their bright orange advertisements pepper the bus stops in the neighborhood, and prospective parents receive full color mailings almost monthly.

Ms. Moskowitz said the extensive outreach was necessary to make sure they were drawing from a broad spectrum of parents. Ms. Moskowitz said they spent roughly $90 per applicant for recruitment. With about 3,600 applicants last year for the four schools in the network, she said, the total amounted to $325,000.

Saturday, March 13, 2010

Social norms and happy employees

I came accross the following from from Jay Golz's New York Times blog:

About 10 years ago I was having my annual holiday party, and my niece had come with her newly minted M.B.A. boyfriend. As he looked around the room, he noted that my employees seemed happy. I told him that I thought they were.

Then, figuring I would take his new degree for a test drive, I asked him how he thought I did that. “I’m sure you treat them well,” he replied.

“That’s half of it,” I said. “Do you know what the other half is?”

He didn’t have the answer, and neither have the many other people that I have told this story. So what is the answer? I fired the unhappy people. People usually laugh at this point. I wish I were kidding.

In my experience, it is generally unhappy employees who say things like "But what happens to our business model if home prices go down?" or "Doesn't that look kinda like an iceberg?" Putting that aside, though, this is another example of the principle discussed in the last post -- it's easy to get the norms you want if you can decide who goes in the group.

Charter schools, social norming and zero-sum games

You've probably heard about the Harlem Children's Zone, an impressive, even inspiring initiative to improve the lives of poor inner-city children through charter schools and community programs. Having taught in Watts and the Mississippi Delta in my pre-statistician days, this is an area of long-standing interest to me and I like a lot of the things I'm hearing about HCZ. What I don't like nearly as much is the reaction I'm seeing to the research study by Will Dobbie and Roland G. Fryer, Jr. of Harvard. Here's Alex Tabarrok at Marginal Revolution with a representative sample, "I don't know why anyone interested in the welfare of children would want to discourage this kind of experimentation."

Maybe I can provide some reasons.

First off, this is an observational study, not a randomized experiment. I think we may be reaching the limits of what analysis of observational data can do in the education debate and, given the importance and complexity of the questions, I don't understand why we aren't employing randomized trials to answer some of these questions once and for all.

More significantly I'm also troubled by the aliasing of data on the Promise Academies and by the fact that the authors draw a conclusion ("HCZ is enormously successful at boosting achievement in math and ELA in elementary school and math in middle school. The impact of being offered admission into the HCZ middle school on ELA achievement is positive, but less dramatic. High-quality schools or community investments coupled with high-quality schools drive these results, but community investments alone cannot.") that the data can't support.

In statistics, aliasing means combining treatments in such a way that you can't tell which treatment or combination of treatments caused the effect you observed. In this case the first treatment is the educational environment of the Promise Academies. The second is something called social norming.

When you isolate a group of students, they will quickly arrive at a consensus of what constitutes normal behavior. It is a complex and somewhat unpredictable process driven by personalities and random connections and any number of outside factors. You can however, exercise a great deal of control over the outcome by restricting the make-up of the group.

If we restricted students via an application process, how would we expect that group to differ from the general population and how would that affect the norms the group would settle on? For starters, all the parents would have taken a direct interest in their children's schooling.

Compared to the general population, the applicants will be much more likely to see working hard, making good grades, not getting into trouble as normal behaviors. The applicants (particularly older applicants) would be more likely to be interested in school and to see academic and professional success as a reasonable possibility because they would have made an active choice to move to a new and more demanding school. Having the older students committed to the program is particularly important because older children play a disproportionate role in the setting of social norms.

Dobbie and Fryer address the question of self-selection, "[R]esults from any lottery sample may lack external validity. The counterfactual we identify is for students who are already interested in charter schools. The effect of being offered admission to HCZ for these students may be different than for other types of students." In other words, they can't conclude from the data how well students would do at the Promise Academies if, for instance, their parents weren't engaged and supportive (a group effective eliminated by the application process).

But there's another question, one with tremendous policy implications, that the paper does not address: how well would the students who were accepted to HCZ have done if they were given the same amount of instruction * as they would have received from HCZ using public school teachers while being isolated from the general population? (There was a control group of lottery losers but there is no evidence that they were kept together as a group.)

Why is this question so important? Because we are thinking about spending an enormous amount of time, effort and money on a major overhaul of the education system when we don't have the data to tell us if what we'll spend will wasted or, worse yet, if we are to some extent playing a zero sum game.

Social norming can work both ways. If you remove all of the students whose parents are willing and able to go through the application process, the norms of acceptable behavior for those left behind will move in an ugly direction and the kids who started out with the greatest disadvantages would be left to bear the burden.

But we can answer these questions and make decisions based on solid, statistically sound data. Educational reform is not like climate change where observational data is our only reasonable option. Randomized trials are an option in most cases; they are not that difficult or expensive.

Until we get good data, how can we expect to make good decisions?

* Correction: There should have been a link here to this post by Andrew Gelman.

Friday, March 12, 2010

Instrumental variables

I always have mixed feelings about instrumental variables (at least insofar as the instrument is not randomization). On one hand they show amazing promise as a way to handle unmeasured confounding. On the other hand, it is difficult to know if the assumptions required for a variable to be an instrument are being met or not.

This is a important dilemma. Alan Brookhart, who introduced them into phamracoepidemiology in 2006, has done an amazing job of proving out one example. But you can't generalize from one example and the general idea of using physician preference as an instrument, while really cool, suffers from these assumptions.

Unlike unmeasured confounders, it's hard to know how to test this. With unmeasured confounders you can ask critics to specify what they suspect might be the key confounding factors and go forth and measure them. But instruments are used precisely when there is a lack of data.

I've done some work in the area with some amazing colleagues and I still think that the idea has some real promise. It's a novel idea that really came out of left field and has enormous potential. But I want to understand it in far more actual cases before I conclude much more . . .

Thursday, March 11, 2010

Propensity Score Calibration

I am on the road giving a guest lecture at UBC today. One of the topics I was going to cover in today's discussion was propensity score calibration (by the ever brilliant Til Sturmer). But I wonder -- if you have a true random subset of the overall population -- why not just use it? Or, if as Til assumes, the sample is too small, why not use multiple imputation? Wouldn't that be an equivalent technique that is more flexible for things like sub group analysis?

Or is it the complexity of the imputation in data sets of the size Til worked with that was the issue? It's certainly a point to ponder.

Worse than we thought -- credit card edition

For a while it looked like the one good thing about the economic downturn was that it was getting people to pay down their credit card debts. Now, according to Felix Salmon, we may have to find another silver lining:

Total credit-card debt outstanding dropped by $93 billion, or almost 10%, over the course of 2009. Is that cause for celebration, and evidence that U.S. households are finally getting their act together when it comes to deleveraging their personal finances? No. A fascinating spreadsheet from CardHub breaks that number down by looking at two variables: time, on the one hand, and charge-offs, on the other.

It turns out that while total debt outstanding dropped by $93 billion, charge-offs added up to $83 billion — which means that only 10% of the decrease in credit card debt — less than $10 billion — was due to people actually paying down their balances.

Tuesday, March 9, 2010

Perils of Convergence

This article ("Building the Better Teacher") in the New York Times Magazine is generating a lot of blog posts about education reform and talk of education reform always makes me deeply nervous. Part of the anxiety comes having spent a number of years behind the podium and having seen the disparity between the claims and the reality of previous reforms. The rest comes from being a statistician and knowing what things like convergence can do to data.

Convergent behavior violates the assumption of independent observations used in most simple analyses, but educational studies commonly, perhaps even routinely ignore the complex ways that social norming can cause the nesting of student performance data.

In other words, educational research is often based of the idea that teenagers do not respond to peer pressure.

Since most teenagers are looking for someone else to take the lead, social norming can be extremely sensitive to small changes in initial conditions, particularly in the make-up of the group. This makes it easy for administrators to play favorites -- when a disruptive or under-performing student is reassigned from a favored to an unfavored teacher, the student lowers the average of the second class and often resets the standards of normal behavior for his or her peers.

If we were to adopt the proposed Jack-Welch model (big financial incentitves at the top; pink slips at the bottom), an administrator could, just by moving three or four students, arrange for one teacher to be put in line for for achievement bonuses while another teacher of equal ability would be in danger of dismissal.

Worse yet, social norming can greatly magnify the bias caused by self-selection and self-selection biases are rampant in educational research. Any kind of application process automatically removes almost all of the students that either don't want to go to school or aren't interested in academic achievement or know that their parents won't care what they do.

If you can get a class consisting entirely of ambitious, engaged students with supportive parents, social norming is your best friend. These classes are almost (but not quite) idiot proof and teachers lucky enough to have these classes will see their metrics go through the roof (and their stress levels plummet -- those are fun classes to teach). If you can get an entire school filled with these students, the effect will be even stronger.

This effect is often stated in terms of the difference in performance between the charter schools and the schools the charter students were drawn from which adds another level of bias (not to mention insult to injury).

Ethically, this raises a number of tough questions about our obligations to all students (even the difficult and at-risk) and what kind of sacrifices we can reasonably ask most students to make for a few of their peers.

Statistically, though, the situation is remarkably clear: if this effect is present in a study and is not accounted for, the results are at best questionable and at worst meaningless.

(this is the first in a series of posts about education. Later this week, I'll take a look at the errors in the influential paper on Harlem's Promise Academy.)

Efficacy versus effectiveness

One of the better examples that I have found of this distinction is with physical activity. Travis Saunders talks about the difference between a closely monitored exercise program and encouraging exercise related behavior (despite randomization).

This should be a warning for those of us in drug research as well; not even randomization will help if you have a lot of cross-overs over time or if user tend to alter other behaviors as a result of therapy. This isn't very plausible with some drugs with few side effects (statins) but could be really important for others where the effects can alter behavior (NSAIDs). In particular, it makes me wonder about our actual ability to use randomized experiments of pain medication for arthritis (except, possibly, in the context of comparative effectiveness).

But it is worth thinking about when trying to interpret observational data. What else could you be missing?

Monday, March 8, 2010

Undead papers

Okay, so what do y'all do when a paper becomes undead? We all have work that stopped, for one reason or another, but really needs to be brought to a conclusion. Not even necessarily a happy conclusion (sometimes putting a project out of its misery is the kindest decision for all involved -- especially the junior scientist leading the charge). But sometimes it is the case that the results are just not that compelling (but it still deserves to be published in the journal of minor findings).

But I wonder what is the secret to motivation under these conditions?

Sunday, March 7, 2010

"Algebra in Wonderland" -- recommended with reservations

In today's New York Times, Melanie Bayley, a doctoral candidate in English literature at Oxford, argues that Lewis Carroll's Alice in Wonderland can be interpreted as a satire of mathematics in the mid-Nineteenth Century, particularly the work of Hamilton and De Morgan.

The essay has its share of flaws: none of the analogies are slam-dunk convincing (the claim that the Queen of Hearts represents an irrational number is especially weak); the omission of pertinent works like "A Tangled Tale" and "What the Tortoise Said to Achilles" is a bit strange; and the conclusion that without math, Alice might have been more like Sylvie and Bruno would be easier to take seriously if the latter book hadn't contained significant amounts of mathematics* and intellectual satire.

Those problems aside, it's an interesting piece, a great starting point for discussing mathematics and literature and it will give you an excuse to dig out your Martin Gardner books. Besides, how often do you get to see the word 'quaternion' on the op-ed page?


* including Carroll's ingenious gravity powered train.

Friday, March 5, 2010

When is zero a good approximation

I was commenting on Andrew Gelman's blog when a nice commentator pointed something out that I usually don't think much about: pharmacoepidemiology outcomes include both cost and efficacy.

Now, a lot of my work has been on older drugs (aspirin, warfarin, beta blockers are my three most commonly studied drugs) so I have tended to assume that cost was essentially zero. A years supply of aspirin for $10.00 is an attainable goal and so I have assumed that we can neglect the cost of therapy.

But does that make sense if we are talking a targeted chemotherapy? In such a case, we might have to weight not just the burden of additional adverse events but the cost of the medication itself.

It's becoming appalling clear to me that I don't have a good intuition of how to model this well. Making everything a cost and assuming a price on years of life lost is one approach but the complexity of pricing involved (and the tendency for relative costs to change over time) worried me about external validity.

I know what I will be thinking about this weekend!

Thursday, March 4, 2010

How are genetically engineered crops like AAA rated structured bonds?

Felix Salmon draws a clever analogy:

If you only grow one crop, the downside of losing it all to an outbreak is catastrophe. In rural Iowa it might mean financial ruin; in Niger, it could mean starvation.

Big agriculture companies like DuPont and Archer Daniels Midland (ADM), of course, have an answer to this problem: genetically engineered crops that are resistant to disease. But that answer is the agricultural equivalent of creating triple-A-rated mortgage bonds, fabricated precisely to prevent the problem of credit risk. It doesn’t make the problem go away: It just makes the problem rarer and much more dangerous when it does occur because no one is — or even can be — prepared for such a high-impact, low-probability event.

Valuing Pain

Readers of this blog will know that I have some concerns about the regulation of pain medications. The FDA continues to warn about the issue of liver injury when taking acetaminophen.

For a moment, let's ignore the case of people taking the drug inappropriately or for whom another medication would provide better symptom control. They exist and are relevant to policy discussions, but they distract from today's main thought.

We can measure liver damage and death (hard outcomes). We cannot easily measure pain -- what level of pain relief is worth a 1% chance of death?

So do we leave it up to individual judgment? Drugs can be confusing and acetaminophen (due to efficacy) is included in a lot of preparations (for important reasons). So what is the ideal balance between these two goals (prevent adverse events and relieving pain)?

It would be so much easier if pain were easy to measure . . .

Wednesday, March 3, 2010

p-values

Another nice critique of relying on p-values. There is also a fine example in the comments of why one should double check when they think things look odd. Often it is better to keep one's mouth shut and be thought a fool than to open it and remove all doubt.

Tuesday, March 2, 2010

Comparing Apples and Really Bad Toupees

DISCLAIMER: Though I have worked in some related areas like product launches, I have never done an analysis of brand value. What follows are a few thoughts about branding without any claim of special expertise or insight. If I've gotten something wrong here I would appreciate any notes or corrections.

Joseph's post reminded me of this article in the Wall Street Journal about the dispute between Donald Trump and Carl Icahn over the value of the Trump brand. Trump, not surprisingly, favors the high end:
In court Thursday, Mr. Trump boasted that his brand was recently valued by an outside appraiser at $3 billion.

In an interview Wednesday, Mr. Trump dismissed the idea that financial troubles had tarnished his casino brand. He also dismissed Mr. Icahn's claims that the Trump gaming brand was damaged, pointing to a recent filing in which Mr. Icahn made clear that he wants to assume the license to the brand. "Every building in Atlantic City is in trouble. OK? This isn't unique to Trump," he said. "Everybody wants the brand, including Carl. It's the hottest brand in the country."
While Icahn's estimate is a bit lower:
Mr. Icahn, however, believes his group also would have the right to use the Trump name under an existing licensing deal, but says the success of the casinos don't hinge on that. The main disadvantage to losing the name, he says, would be the $15 million to $20 million cost of changing the casinos' signs.
So we can probably put the value of the Trump brand somewhere in the following range:

-15,000,000 < TRUMP < 3,000,000,000

(the second inequality should be less than or equal to -- not sure how to do it on this text editor)

Neither party here is what you'd call trustworthy and both are clearly pulling the numbers they want out of appropriate places but they are able to make these claims with straight faces partly because of the nature of the problem.

Assigning a value to a brand can be a tricky thing. Let's reduce this to pretty much the simplest possible case and talk about the price differential between your product and a similar house brand. If you make Clorox, we're in pretty good shape. There may be some subtle difference in the quality between your product and, say, the Target store brand but it's probably safe to ignore it and ascribe the extra dollar consumers pay for your product to the effect.

But what about a product like Apple Computers? There's clearly a brand effect at work but in order to measure the price differential we have to decide what products to compare them to. If we simply look at specs the brand effect is huge but Apple users would be quick to argue that they were also paying for high quality, stylish design and friendly interfaces. People certainly pay more for Macs, Ipods, Iphones, and the rest, but how much of that extra money is for features and how much is for brand?

(full disclosure: I use a PC with a dual Vista/Ubuntu operating system. I do my programming [Python, Octave] and analysis [R] in Ubuntu and keep Vista for compatibility issues. I'm very happy with my system. If an Apple user would like equal time we'd be glad to oblige)

I suspect that more products are closer to the Apple end of this spectrum than the Clorox end but even with things like bleach, all we have is a snapshot of a single product. To useful we need to estimate the long term value of the brand. Is it a Zima (assuming Zima was briefly a valuable brand) or is it a Kellogg's Corn Flakes? And we would generally want a brand that could include multiple brands. How do we measure the impact of a brand on products we haven't launched yet? (This last point is particularly relevant for Apple.)

The short answer is you take smart people, give them some precedents and some guidelines then let them make lots of educated guesses and hope they aren't gaming the system to tell you what you want to hear.

It is an extraordinarily easy system to game even with guidelines. In the case of Trump's casinos we have three resorts, each with its own brand that interacts in an unknown and unknowable way with the Trump brand. If you removed Trump's name from these buildings, how would it affect the number of people who visit or the amount they spend?

If we were talking about Holiday Inn or even Harrah's, we could do a pretty good job estimating the effect of changing the name over the door. We would still have to make some assumptions but we would have data to back them up. With Trump, all we would have is assumption-based assumptions. If you take these assumptions about the economy, trends in gambling and luxury spending, the role of Trump's brand and where it's headed, and you give each one of them a small, reasonable, completely defensible nudge in the right direction, it is easy to change your estimates by one or two orders of magnitude.

We also have an unusual, possibly even unique, range of data problem. Many companies have tried to build a brand on a public persona, sometimes quite successfully. Normally a sharp business analyst would be in a good position to estimate the value of one of these brands and answer questions like "if Wayne Gretsky were to remove his name from this winter resort, what impact would it have?"

The trouble with Trump is that almost no one likes him, at least according to his Q score. Most persona-based brands are built upon people who were at some point well-liked and Q score is one of the standard metrics analysts use when looking at those brands. Until we get some start-ups involving John Edwards and Tiger Woods, Mr. Trump may well be outside of the range of our data.

Comparing apples and oranges

Comparing salaries across national borders is a tricky thing to do. I was reminded of this problem while reading a post from Female Science Professor. My experience has been limited to the US and Canada but, even there, it's hard to really contrast these places. When I worked in Montreal, I had easy access to fast public transit, most things in walking distance, inexpensive housing but a much lower salary. In Seattle I have reluctantly concluded that, given my work location, a car was essential.

So how do you compare salaries?

This is actually a general problem in Epidemiology. Socio-economic status is known to be an important predictor of health. But it is tricky to measure. Salary needs to be adjusted for cost of living; hard even when you have good location information (which, in de-identified data you may very well not). Even in large urban areas, costs can be variable depending on location.

Alternatively, there are non-financial rewards (that are status boosting) in many jobs; how do you weight these? Adam Smith noted back in the Wealth of Nations that the a prestigious position was related to lower wages. How do you compare equal salaries between a store clerk and a journalist?

Is a hard problem and I really lack a great solution. But it's worth putting some real thought into!!

Monday, March 1, 2010

"What bankers can learn from arc-welder manufacturers"

Felix Salmon points out the following from a book review from the Wall Street Journal:

Mr. Koller contends that layoffs deprive companies of profit-generating talent and leave the remaining employees distrustful of management—and often eager to find jobs elsewhere ahead of the next layoff round. He cites research showing that, on average, for every employee laid off from a company, five additional ones leave voluntarily within a year. He concludes that the cost of recruiting, hiring and training replacements, in most cases, far outweighs the savings that chief executives assume they're getting when they initiate wholesale firings and plant closings.

Having actually built some of the models that directly or indirectly determined hiring and layoffs, and more importantly having been the one who explained those models to the higher-ups, I very much doubt that most companies spend enough time looking at the hidden and long term costs of layoffs.

The book is Spark, by Frank Koller. Sounds interesting.

Selection Bias with Hazard Ratios

Miguel Hernan has a recetn article on the Hazards of Hazard Ratios. The thing that jumped to my attention was his discussion of "depletion of susceptibles". Any intervention can look protective, eventually, if speeds up disease in the susceptible such that the rate of events in that population eventually drops (as all of the members of the population able to have an event have had it).

I think that this element of hazards ratios illustrates two principles:

1) it always makes sense to begin the analysis of a medication at first use or else you can miss a lot

2) In the long run, we are all dead

So the real trick seems to be more focus on good study design and being careful to formulate problems with precision. Quality study design never goes out of style!

Nate SIlver debunks another polling myth

Here's the old chestnut (from Robert Moran):


In a two way race, political professionals don't even bother to look at the spread between the incumbent and the challenger, they only focus on the incumbent's support relative to 50%. Incumbents tend to get trace elements of the undecideds at the end of a campaign. Sure, there is the occasional exception, but this rule is fairly ironclad in my experience.


Here's Silver's takedown:


There are several noteworthy features of this graph:


1) It is quite common for an incumbent to be polling at under 50 percent in the early polling average; this was true, in fact, of almost half of the races (30 of the 63). An outright majority of incumbents, meanwhile, had at least one early poll in which they were at under 50 percent of the vote.


2) There are lots of races in the top left-hand quadrant of the graph: these are cases in which the incumbent polled at under 50 percent in the early polling average, but wound up with more than 50 percent of the vote in November. In fact, of the 30 races in which the incumbent had less than 50 percent of the vote in the early polls, he wound up with more than 50 percent of the vote 18 times -- a clear majority. In addition, there was one case in which an incumbent polling at under 50 percent wound up with less than 50 percent of the November vote, but won anyway after a small third-party vote was factored in. Overall, 19 of the 30 incumbents to have less than 50 percent of the vote in the early polling average in fact won their election.


3) 5 of the 15 incumbents to have under 45 percent of the vote in early polls also won their elections. These were Bob Menendez (38.9 percent), Tim Palwenty (42.0 percent), Don Carcieri (42.3 percent), Jennifer Granholm (43.4 percent) and Arnold Schwarzenegger (44.3 percent), all in 2006.3b) If we instead look at those cases within three points of Ted Strickland's 44 percent, when the incumbent had between 41 and 47 percent of the vote in early polls, he won on 11 of 17 occasions (65 percent of the time).


4) Almost all of the data points are above the red diagonal line, meaning that the incumbent finished with a larger share of the vote than he had in early polls. This was true on 58 of 63 occasions.


4b) On average, the incumbent added 6.4 percent to his voting total between the early polling average and the election, whereas the challenger added 4.5 percent. Looked at differently, the incumbent actually picked up the majority -- 59 percent -- of the undecided vote vis-a-vis early polls.


4c) The above trend seems quite linear; regardless of the incumbent's initial standing in the early polls, he picked up an average of 6-7 points by the election, although with a significant amount of variance.


5) The following corollary of Moran's hypothesis is almost always true: if an incumbent has 50 percent or more of the vote in early polls, he will win re-election. This was true on 32 of 33 occasions; the lone exception was George Allen in Virginia, who had 51.5 percent of the vote in early polls in 2006 but lost re-election by less than a full point (after running a terrible campaign). It appears that once a voter is willing to express a preference for an incumbent candidate to a pollster, they rarely (although not never) change their minds and vote for the challenger instead.

Saturday, February 27, 2010

Meta-Freakonomics

Joseph recently wrote a post referring to this post by Andrew Gelman (which was based on a series of posts by Kaiser Fung which check the veracity of various claims in Superfreakonomics -- welcome to the convoluted world of the blogosphere). Joseph uses Dr. Gelman's comments about the poor editing and fact-checking of the book to make a point about the disparity between the contribution editing makes and how little we reward it. He ought to know; I have frequently taken advantage of his good nature in this area, but at the risk of being ungrateful, I don't think the point applies here. Rather than being helpful, the kind of criticism Joseph and Gelman describe could only hurt Superfreakonomics.

Or put another way, if we approach this using the techniques and assumptions of the Freakonomics books, we can show that by foregoing a rigorous internal review process the authors were simply acting rationally.

Before we get to the actual argument, we need to address one more point in Joseph's post. Joseph says that providing critical read "is one of the most helpful things a colleague can do for you, yet one of the least rewarded." This statement is absolutely true for easily 99.9% of the books and manuscripts out there. It is not, however, true for the Freakonomics books. Between their prestige and the deep pockets of William Morrow, Levitt and Dubner could have gotten as many highly-qualified internal reviewers as they wanted, reviewers who would have been compensated with both an acknowledgment and a nice check. (Hell, they might even get to be in the movie.)

But if the cost and difficulty of putting together an all-star team of reviewers for Superfreakonomics would have been negligible, how about the benefits? Consider the example of its highly successful predecessor. Freakonomics was so badly vetted that two sections (including the book's centerpiece on abortion) were debunked almost immediately. The source material for the KKK section was so flawed that even Levitt and Dubner disavowed it.

These flaws could have been caught and addressed in the editing process but how would making those corrections help the authors? Do we have any reason to believe that questionable facts and sloppy reasoning cost Levitt and Dubner significant book sales (the book sold over four million copies)? That they endangered the authors' spot with the New York Times? Reduced in any way the pervasive influence the book holds over the next generation of economists? Where would Levitt and Dubner have benefited from a series of tough internal reviews?

Against these elusive benefits we have a number of not-so-hard-to-find costs. While the time and money required to spot flaws is relatively minor, the effort required to address those flaws can be substantial.

Let's look at some specifics. Kaiser Fung raises a number of questions about the statistics in the "sex" chapter (the one about female longevity is particularly damning) and I'm sure he overlooked some -- not because there was anything wrong with his critique but because finding and interpreting reliable data on a century of sex and prostitution is extraordinarily difficult. It involves measurement covert behavior that can be affected by zoning, police procedures, city politics, shifts in organized crime,and countless other factors. Furthermore these same factors can bias the collection of data in nasty and unpredictable ways.

Even if all of the sex chapter's underlying economics arguments were sound (which they are, as far as I know), there would still have been a very good chance that some reviewer might have pointed out flawed data, discredited studies, or turned up findings from more credible sources that undercut the main hypotheses. That doesn't mean that the chapter couldn't be saved -- a good team of researchers with enough time could probably find solid data to support the arguments (assuming, once again, that they were sound) but the final result would be a chapter that would look about the same to the vast majority of readers and external reviewers -- all cost, no benefit.

Worse yet, think about the section on the relative dangers of drunken driving vs. drunken walking. These cute little counter-intuitive analyses are the signature pieces of Levitt and Dubner (and were associated with Dr. Levitt before he formed the team). They are the foundation of the brand. Unfortunately, counter-intuitive analyses tend to be fragile creatures that don't fare that well under scrutiny (intuition has a pretty good track record).

The analysis of modes of drunken transportation would be one of the more fragile ones. Most competent internal reviewers would have had the same reaction that Ezra Klein had:
You can go on and on in this vein. It's terrifically shoddy statistical work. You'd get dinged for this in a college class. But it's in a book written by a celebrated economist and a leading journalist. Moreover, the topic isn't whether people prefer chocolate or vanilla, but whether people should drive drunk. It is shoddy statistical work, in other words, that allows people to conclude that respected authorities believe it is safer for them to drive home drunk than walk home drunk. It's shoddy statistical work that could literally kill somebody. That makes it more than bad statistics. It makes it irresponsible.
Let me be clear. I am not saying that Levitt and Dubner knew there were mistakes here. Quite the opposite. I'm saying they had a highly saleable manuscript ready to go which contained no errors that they knew of, and that any additional checking of the facts, the analyses or logic in the manuscript could only serve to make the book less saleable, to delay its publication or to put the authors in the ugly position of publishing something they knew to be wrong.

Gelman closes his post with this:
It's the nature of interesting-but-true facts that they're most interesting if true, and even more interesting if they're convincingly true.
Perhaps, but Levitt and Dubner have about four million reasons that say he's wrong.

When you really want to argue causality...

There's always a way.

John Quiggin does the dirty work:
I underestimated the speed and power of Zombie ideas. As early as Sep 2009, Casey Mulligan was willing to claim that the entire crisis could be explained in terms of labor market interventions. According to Mulligan, financial markets anticipated a variety of measures from the Obama Administration, observing ‘Arguably, the 2008 election was associated with an increase in the power of unions to shape public policy, and thereby the labor market. Congress has considered various legislation that would raise marginal income tax rates, and would present Americans with new health benefits that would be phased out as a function of income.’

This is truly impressive. So perspicacious are the financial markets, that even the possibility that Congress might raise taxes, or incorporate a means test in health care legislation that might be passed some time in the future (at the time of writing this in Feb 2010, the bill was still tied up) was sufficient to bring down the entire global financial market. And, even though the McCain-Palin ticket was widely seen as having a good chance (at least before the September 2008), the markets didn’t wait for the election returns to come in. Applying some superstrong version of market efficiency, market participants predicted the election outcome, applied Mulligan’s neoclassical model to the predicted policies of the Obama Administration and (perfectly rationally) panicked.

Friday, February 26, 2010

IPTW news

Peter C. Austin in his new article The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies compares a number of different propensity score approaches for modeling risk differences. Curiously, inverse probability of treatment weighting out-performed matching on propensity scores. My intuition is that they would have had similar levels of accuracy and bias.

It's going to be interesting to think about why this result holds.

Neat stuff -- go, read and enjoy!

Thursday, February 25, 2010

Editing

Andrew Gelman makes a great point about editing ; there is nothing that helps more than having somebody do a critical read through a a manuscript to point out where your reasoning is sloppy. This is one of the most helpful things a colleague can do for you, yet one of the least rewarded. It can be painful to hear these comments but it's worth every agonizing moment.

Wednesday, February 24, 2010

Stand and deliver

This article by the gifted Olivia Judson* explores the research about sitting and obesity that Joseph was talking about and makes some interesting suggestions:
Some people have advanced radical solutions to the sitting syndrome: replace your sit-down desk with a stand-up desk, and equip this with a slow treadmill so that you walk while you work. (Talk about pacing the office.) Make sure that your television can only operate if you are pedaling furiously on an exercise bike. Or, watch television in a rocking chair: rocking also takes energy and involves a continuous gentle flexing of the calf muscles. Get rid of your office chair and replace it with a therapy ball: this too uses more muscles, and hence more energy, than a normal chair, because you have to support your back and work to keep balanced. You also have the option of bouncing, if you like.
* and could someone explain to me why the New York Times' best science writer only shows up in the opinion section.

“We've made enormous advances in what they're called” -- more on corporate data cooking

Yesterday, I mentioned how bundled offers and the ability to pick the most advantageous data could allow a company to produce any number of grossly dishonest statistics. Today over at Baseline Scenario, James Kwak explains how J.P. Morgan can use acquisitions and flexible definitions to perform similar magic with its promise to loan $10 billion to small businesses:
Still, $10 billion is still an increase over the previous high of $6.9 billion in 2007, right? Well, not quite. Because in the meantime, JPMorgan Chase went and bought Washington Mutual. At the end of 2007, Washington Mutual held over $47 billion in commercial loans of one sort or another (from a custom FDIC SDI report that you can build here). Most of those are not small business by JPMorgan’s definition, since commercial real estate and multifamily real estate got put into the Commercial Banking business after the acquisition. But that still leaves $7.5 billion in potential small business loans, up from $5.1 billion at the end of 2006, which means WaMu did at least $2.4 billion of new lending in 2007.

I don’t know how much of this is small business lending, but this is part of the problem — banks can choose what they call small business lending, and they can choose to change the definitions from quarter to quarter. It’s not also clear (from the outside, at least) what counts as an origination. If I have a line of credit that expires and I want to roll it over, does that count as an origination? My guess is yes. Should it count as helping small businesses and the economy grow? No.

Sitting and obesity

It's one of the more difficult epidemiology questions to answer: why is obesity rising so quickly?

This is a very hard question to answer decisively, there is some reason that Americans have gotten over-weight in the past 30-40 years. It's not pure food abundance as we have had that for a long time. It's not genetic in the sense of the population genetics changing as there has not been enough time (genetic susceptibility is another matter).

So the idea that more time spent sitting leads to obesity is a very interesting hypothesis. I wonder how feasible it would be to design a cluster randomized trial for workplace interventions (like standing to use the computer).

Tuesday, February 23, 2010

Avandia to be withdrawn?

From Derek at In the Pipeline, it looks like leaks from a Senate report indicate that Avandia is about to be removed from the market. Thus ends a long run of pharmacoepidmeiology papers on the subject. It's not been an area that I worked in personally, but some of my friends have. Studying the heart risks of Avandia is tricky for observational data -- the disease being treated (diabetes) is a risk factor for the major side effect. This makes it very hard to separate disease and drug effects (especially since it is hard to control for severity and duration of a silent disease like diabetes).

But the existence of a comparator drug that showed a better risk profile for cardiovascular events was probably the decisive factor. Pharmacovigilance really can save lives!

How to Lie with Statistics -- Allstate Edition

For our latest statistical lie of the week, check out the following commercial.




At the risk of putting too fine a point on it, here's a full breakdown.

Customers of the two companies fall into one of four categories:

Geico customers who would get a better deal with All State;

Geico customers who would get a better deal with Geico;

All State customers who would get a better deal with All State;

All State customers who would get a better deal with Geico.

If we knew the relative sizes of those four groups and the average savings of the first and last groups we'd have a fairly comprehensive picture. Not surprisingly neither Allstate nor GEICO went that far. Both companies talk about the savings of people who switched.

Most people presumably switch providers to get a better deal (putting them in the first or last groups). Furthermore, switching is a hassle so the savings have to be big enough to make up for the trouble. The result are highly biased self-selecting samples of the first and last groups.

When GEICO simply mentions a potential savings of 15%, they are being a bit less than forthcoming but the claim that you might be able to save a substantial amount of money by switching is reasonable. For honest-to-goodness lying you need to wait for the Allstate commercial.

Allstate also bases their claims on the savings of those who switched to their company, but unlike GEICO they use those claims as part of a classic lie-by-hypothesis -- making a statement then supporting it with an incomplete or unrelated statistic. The ad starts with a trustworthy-sounding Dennis Haysbert saying "If you think GEICO's the cheap insurance company, then you're going to really be confused when you hear this" then touting an average savings of $518.

Yes, you might be confused, particularly if you don't realize the sample is ridiculously biased or that we aren't told the size of the policies or how long a period the $518 average was calculated over (the small print at the bottom refers to 2007 data which seems a bit suspicious, particularly given the following disclaimer at the bottom of Allstate's website "*$396 Average annual savings based on information reported nationally by new Allstate auto customers for policies written in 2008." No competitor is mentioned so the second number is presumably a general average. This could explain the difference in the numbers but not decision to shift periods).

I would also be suspicious of the data-cooking potential of Allstate's bundled products. Here's how the old but effective scam works: you single out one product a loss leader. They may sell this as a feature -- save big on car insurance when you get all of your coverage from Allstate -- or the numbers may be buried so deeply in the fine print that you have no idea how your monthly check is being divided. Either way this gives the people massaging the data tremendous freedom. They can shift profits to areas that Wall Street is excited about (happens more often than you might think) or they can create the illusion of bargains if they want to counter the impression of being overpriced. I don't know if any of this is going on here but I'm always cautious around numbers that are this easy to cook.

I would also take into account Allstate's less than shining reputation in the insurance industry, particularly regarding the company's strategies since the mid-Ninties. The story has been covered by Business Week, PBS and Bloomberg which supplied the following:

One McKinsey slide displayed at the Kentucky hearing featured an alligator with the caption ``Sit and Wait.'' The slide says Allstate can discourage claimants by delaying settlements and stalling court proceedings.

By postponing payments, insurance companies can hold money longer and make more on their investments -- and often wear down clients to the point of dropping a challenge. ``An alligator sits and waits,'' Golden told the judge, as they looked at the slide describing a reptile.

McKinsey's advice helped spark a turnaround in Allstate's finances. The company's profit rose 140 percent to $4.99 billion in 2006, up from $2.08 billion in 1996. Allstate lifted its income partly by paying less to its policyholders.
...
Allstate spent 58 percent of its premium income in 2006 for claim payouts and the costs of the process compared with 79 percent in 1996, according to filings with the U.S. Securities and Exchange Commission.
So, even if we put aside the possibility of data cooking, we still have an ethically tarnished company dishonestly presenting a meaningless statistic and that's good enough for our statistical lie of the week.

Monday, February 22, 2010

The Tuition Paradox

This post and Joseph's follow-up has gotten me thinking about a strange aspect of the economics of higher education in recent decades.

At the risk of oversimplifying, undergraduates are primarily paying for instruction and evaluation. The school will teach the student a body of knowledge and a set of skills and will provide the student with a quantitative measure (backed by the reputation of the school) of how well he or she mastered that knowledge and those skills.

The costs associated with providing those services is almost entirely labor driven. While there are exceptions (particularly involving distance learning), most instructors use minimal technology and many just rely on the white board. This is not a criticism (A good teacher with a marker always beats a bad teacher with a Powerpoint), but the costs of a service that can be provided with simple facilities and little or no specialized equipment will always be labor driven.

Twenty or thirty years ago, when you took an undergraduate class you were likely to be taught by a full-time faculty member, not someone with a high salary but reasonably well paid professional with good benefits and excellent job security. These days you are far more likely to be taught by a badly paid adjunct with no benefits or job security.

In other words, when you take into account inflation, the cost to universities of providing instruction and evaluation have dropped sharply while the amount universities charge to provide these services has continued to shoot up.

I'm not say that this is all a scam or that administrators are out there stuffing their pockets, but I do think there's something wrong with this picture.

Are humanities and science careers different?

Mark pointed me to an article by Thomas H. Benton about graduate school in the humanities. These issues have been persistent concerns in the field; I recall arguing about the job prospects of humanity graduates as an undergraduate philosophy major. I think that there really is an argument that the costs (in tuition, living expenses and so forth) that are required for an advanced degree in the humanities can't possibly be compensated for by post-degree job prospects.

Which is okay, if the goal of the degree is edification. But these degrees are not often marketed as expensive luxury goods . . .

In science, I think we are better off. We train people with marketable skills that can lead to careers. Post-degree placement is considered an important metric of success. But I think tales like this are a call to action to make sure that we continue to provide relevant training and to be cautious about blurring the distinction between data and anecdote in terms of outcomes.

If nothing else, it seems to be a good case for outcomes tracking . . .

Sunday, February 21, 2010

Academic work hours

It is pretty true that academics is not a Monday to Friday job. However, there is actually a nice compensation that can often happen. When I was at McGill I made some very good friends just by being in the lab at odd hours (especially late at night). There can be a sense of shared struggle that is an overlooked bonus. Of course, it'd have been even nicer if there was a late night office shop to take breaks in but you cannot have everything!

Friday, February 19, 2010

Multiple Testing

Interesting. False positives in popular fields appear to be much more strongly driven by the number of groups testing these hypotheses rather than by fiddling with data. A very comforting result, insofar as it is true.

More troublesome, is that it is unclear what we can do about it. Being better about publishing negative results helps but is never going to be a perfect solution; especially when reviewers may be more skeptical about results that do not match their intuition.

The difficulty of Soft Outcomes

There is currently a movement to ban combination medications with acetaminophen as an ingredient. The reasoning behind this appears to be due to the potential for liver damage caused by excessive doses of the medication. The estimate of 458 deaths per year seems like a lot, until you realize the denominator is not specified (it won't be the entire US population but it might be 10's of millions).

The other issue, and the one that is interesting to an epidemiologist, is the soft nature of the competing risk. The alternatives to acetaminophen is either a narcotic or a non-steroidal anti-inflammatory drug like ibuprofen. Both of these drugs have downsides (addiction, gastrointestinal bleeding) as well.

But the real alternative is less pain control. And that is hard to judge because it is a soft outcome. How much suffering is worth a life? Lives are easy to count but massively reduced quality of life is much, much trickier. But I think it is important to realize that a hard to measure outcome can still have a known and measurable effect on real people.

So I guess what I want to see a clear articulation of what are the alternatives to the current approach to publishing in hot fields.

Wednesday, February 17, 2010

Post-Doctoral Fellowships

Am I really atypical in having had a decent post-doctoral fellowship? Is it a feature of the University of Washington or of my PI?

But when I read bitter stories about bad experiences then I wonder if this is a "there but for the grace of some powerful entity I go".

I think one issue is that people expect a lot at the end of the PhD (and not without justification -- the PhD is a long and difficult process). But the reality is that the PhD is the license to do research -- meaning you get to start at the beginning all over again. After 12 years of schooling (and an outside career in the financial services industry) that can be rough.

I'm lucky to have found a supportive place and I am pleased that I am moving forward to a good position (although quite sad to be leaving the state of Washington). Here is hoping that academic work is as unexpected pleasant as post-doccing turned out to be!