Wednesday, March 31, 2010

Blockbusters, Franchises and Apostrophes

More on the economics of genre fiction

The story so far: last week Andrew Gelman had a post on a book that discussed the dominance of best seller lists and suggested that it was due to their increased quality and respectability. I argued that the quality and respectability had if anything decreased (here), posted some background information (here and here) then discussed how the economics of publishing from the late Nineteenth Century through the Post-War era had influenced genre fiction. The following closes with a look at where we are now and how the current state of the market determines what we're seeing at the bookstore.

As the market shrank in the last part of the Twentieth Century, the pay scale shifted to the feast and (mostly) famine distribution of today. (The century also saw a similar shift for musicians, artists and actors.) Non-paying outlets sprang up. Fan fiction emerged (non-licensed use of characters had, of course, been around for years -- Tiajuana bibles being a classic example -- but fan fiction was written for the author's enjoyment without any real expectation of payment). These changes are generally blamed on the internet but the conventional wisdom is at least a couple of decades off. All of these trends were well established by the Seventies.

With the loss of the short story market and the consolidation of publishing, the economics of writing on spec became brutal. Writing and trying to sell a novel represents a tremendous investment of time and energy with little hope of success. By comparison writing on spec in the Forties meant coming up with twelve to fifteen pages then sending them off to twenty or so potential markets. The best of these markets paid good money; the worst were hungry for anything publishable.

The shift from short story to novel also meant greater risk for the publisher (and, though we don't normally think of it in these terms, for the reader who also invested money and time). A back-pages story that most readers skipped over might hurt the sales and reputation of a magazine slightly but as long as the featured stories were strong, the effect would be negligible. Novels though are free-standing and the novel gets that gets skipped over is the novel that goes unsold.

When Gold Medal signed John. D. MacDonald they knew were getting a skilled, prolific writer with a track record artistically and commercially successful short fiction. The same could be said about the signing of Donald Westlake, Lawrence Block, Joe Gores and many others. Publishing these first time authors was a remarkably low risk proposition.

Unfortunately for publishers today, there are no potential first time authors with those resumes. Publishers now have to roll the dice on inexperienced writers of unknown talent and productivity. In response to that change, they have taken various steps to mitigate the risk.

One response was the rise of the marketable blockbuster. The earliest example I can think of is the book Lace by Shirley Conran. If memory serves, Lace got a great deal of attention in the publishing world for Conran's huge advance, her lack of fiction-writing experience, and the role marketing played in the process. The general feeling was that the tagline ("Which one of you bitches is my mother? ") came first while the book itself was merely an afterthought.

More recently we have Dexter, a marketer's dream ("He's a serial killer who kills serial killers... It's torture porn you can feel good about!"). The author had a few books in his resume but nothing distinguished. The most notable was probably a collaboration with Star Trek actor Michael Dorn. The first book in the series, Darkly Dreaming Dexter was so poorly constructed that all of the principals had to act completely out of character to resolve the plot (tip for new authors: when a character casually overlooks her own attempted vivisection, it's time for a rewrite*).

The problems with the quality of the novel had no apparent effect on sales, nor did it prevent the character from appearing in a successful series of sequels and being picked up by Showtime (The TV show was handled by far more experienced writers who managed to seal up almost all of the plot holes).

The point here is not that Darkly Dreaming Dexter was a bad book or that publishing standards have declined. The point is that the economics have changed. Experienced fiction writers are more rare. Marketable concepts and franchises are more valuable, as is synergy with other media. The markets are smaller. There are fewer players. And much of the audience has a troublesome form of brand loyalty.

Normally of course brand loyalty is a plus, but books are an unusual case. If you convince a Coke drinker to also to drink Sprite you probably won't increase his overall soda consumption; you'll just have cannibalization. But readers who stick exclusively with one writer are severely underconsuming. Convince James Patterson readers to start reading Dean Koontz and you could double overall sales.

When most readers got their fiction either through magazines or by leafing through paperback racks, it was easy to introduce them to new writers. Now the situation is more difficult. One creative solution has been apostrophe series such as Tom Clancy's Op Center. Other people are credited with actually writing the books but the name above the title is there for branding purposes.

Which all leads us back to the original question: Why did thrillers become so dominant?

They tend to be easily marketable.

They are compatible with franchises.

They lend themselves to adaptation as big budget action movies.

Their somewhat impersonal style makes them suitable for ghosting or apostrophe branding.

They are, in short, they are what the market is looking for. As for me, I'm looking for the next reprint from Hard Case, but I might borrow the latest Turow after you're done with it.


* "Is that a spoiler?"
"No, sir. It was spoiled when I got here."

p.s. I was going to tie in with a branding situation Slim Jim snacks faced a few years ago but this post is running a bit long. Maybe I'll get back to it later.

Vanishing media

Brad Plummer shares the following examples of how ephemeral some of our art and images are:
Actually, though, we don't even need to consider the apocalypse. The fragile state of digital storage is already causing trouble. NASA has a few people racing to recover old images from its Lunar Orbiter missions in the 1960s, which are currently stored on magnetic tapes and may not be long for this world. And the National Archives is struggling to preserve its digital records, which tend to rot faster than paper records.

A related tale of disintegrating media comes from Larry Lessig's Free Culture—though this one has a twist. There are a lot of films that were made after 1923 that have no commercial value anymore. They never made it to video or DVD; the reels are just collecting dust in vaults somewhere. In theory, it shouldn’t be too hard to digitize these films and put them in an archive. But alas, thanks to the Sonny Bono Copyright Term Extension Act that was passed by Congress in 1998, any film made after 1923 won't enter the public domain until at least 2019.

That means these films are still under copyright, and anyone who wanted to restore them would have to track down the copyright-holders (not always easy to do) and probably hire a lawyer. And who's going to go through that much trouble just to restore some obscure movie that only a few people might ever watch? Yet a lot of these older movies were produced on nitrate-based stock, and they'll have dissolved by the time 2019 rolls around, leaving nothing behind but canisters of dust. It's sort of tragic.

It's also a perversion of the original intent of copyright laws. Copyrights like patents are government imposed monopolies that dampen commerce and development of new works. Intellectual property rights were seen, in the words of Jefferson, as a necessary evil to balance the interests of the creators with those of the general public by granting temporary these monopolies.

The suggestion that extending these monopolies for almost a century is meant to protect the interests of creators is absurd. The vast majority of these rights are held by companies like Disney or Time-Warner, companies that frequently screwed over the original creators and are now spending more money lobbying to keep the rights than they did to actually acquire them. This is particularly egregious for Disney, a company founded on adaptations of public domain works.

Another outstanding (and tragic) economics story from This American Life

"A car plant in Fremont California that might have saved the U.S. car industry. In 1984, General Motors and Toyota opened NUMMI as a joint venture. Toyota showed GM the secrets of its production system: how it made cars of much higher quality and much lower cost than GM achieved. Frank Langfitt explains why GM didn't learn the lessons – until it was too late."

Currently available for a free download.

Let's talk about sex

More cool stuff from the New York Times' best science writer (not that the others have set the bar that high)

Tuesday, March 30, 2010

The real thing

Jaime Escalante dies at 79; math teacher who challenged East L.A. students to 'Stand and Deliver'

Jaime Escalante, the charismatic former East Los Angeles high school teacher who taught the nation that inner-city students could master subjects as demanding as calculus, died Tuesday. He was 79.

Today's pointer

I am on the road (trasporting 3 pets solo -- don't ask) so blooging is very light.

But John D Cook brought up an interesting point today that should not be missed. It's a grey area but it is worth being very careful about just how much effort there is involved trying to improve medical care and how many barriers need to be crossed.

It's a difficult balance!

The Decline of the Middle (Creative) Class

I suggested in an earlier post that the rise to dominance of the thriller had not been accompanied by a rise in quality and reputation. In this and the next post, I'll try to put some foundations under this claim.

Popular art is driven by markets and shifts in popular art can always be traced back, at least partly, to economic, social and technological developments as well as changes in popular taste. The emergence of genre fiction followed the rise of the popular magazine (check here for more). Jazz hit its stride as the population started moving to cities. Talking pictures replaced silents when the technology made them possible.

Crime fiction, like science fiction first appeared in response to demand from general interest magazines like the Strand then moved into genre specific magazines like Black Mask and a few years later, cheap paperbacks. The demand for short stories was so great that even a successful author like Fitzgerald saw them as a lucrative alternative to novels. There was money to be made and that money brought in a lot of new writers.

It seems strange to say it now but for much of the Twentieth Century, it was possible to make a middle class living as a writer of short fiction. It wasn't easy; you had to write well and type fast enough to melt the keys but a surprisingly large number of people managed to do it.

Nor were writers the only example of the new creative middle class. According to Rosy McHargue (reported by music historian Brad Kay) in 1925 there were two hundred thousand professional musicians in the United States. Some were just scraping by, but many were making a good living. (keep in mind that many restaurants, most clubs and all theaters had at least one musician on the payroll.) Likewise, the large number of newspapers and independent publishers meant lots of work for graphic artists.

I don't want to wax too nostalgic for this era. Sturgeon's Law held firmly in place: 95% of what was published was crap. But it was the market for crap that made the system work. It provided the freelance equivalent of paid training -- writers could start at least partially supporting themselves while learning their craft, and it mitigated some of the risk of going into the profession -- even if you turned out not to be good enough you could still manage food and shelter while you were failing.

It was also a remarkably graduated system, one that rewarded quality while making room for the aforementioned crap. The better the stories the better the market and the higher the acceptance rate. In 1935, Robert E. Howard made over $2,000 strictly through magazine sales. Later, as the paperback market grew, writers at the very top like Ray Bradbury or John O'Hara would also see their stories collected in book form.

Starting with Gold Medal Books, paperback originals became a force in 1950. This did cut into the magazine market and hastened the demise of the pulps but it made it easier than ever before to become a novelist. It was more difficult (though still possible) to make a living simply by selling short stories, but easier to make the transition to longer and more lucrative works.

It was, in short, a beautifully functioning market with an almost ideal compensation system for a freelance based industry. It produced some exceptionally high quality products that have generated billions of dollars and continue to generate them in resales and adaptations (not to mention imitations and unlicensed remakes). This includes pretty much every piece of genre fiction you can think written before 1970.

The foundation of that system, the short story submarket, is essentially dead and the economics and business models of the rest of the publishing industry has changed radically leading to the rise of marketing, the blockbuster mentality and what I like to call the Slim Jim conundrum.

Tune in next time.

Monday, March 29, 2010

Now I'm going to spend the rest of the day wondering what a giant deodorant gun looks like

From the Guardian via TNR:
Beijing is to install 100 deodorant guns at a stinking landfill site on the edge of the city in a bid to dampen complaints about the capital's rubbish crisis. ...

Thrillers on Economics -- a quick digression

I've been working on a series of posts about the economics of crime novels (see here and here) and it got me thinking about economics in crime novels. I'm no expert but here's my incomplete survey.

George Goodman (a.k.a. "Adam Smith") once bemoaned the absence of business in American literature with the notable exception of John P. Marquand. With all due respect to the estimable Marquand (himself no stranger to the pulps), Goodman might have found what he was looking for if he had spent less time in high-end bookstores and more time in his corner drugstore looking at the books with the lurid covers.

Of the many crime novels built around businesses, the best might be Murder Must Advertise, a Lord Whimsey by Dorothy L. Sayers. The story is set in a London ad agency in the Thirties, a time when the traditional roles of the aristocracy were changing and "public school lads" were showing up in traditional bourgeois fields like advertising.

Sayers had been a highly successful copywriter (variations on some of her campaigns are still running today) and has sometimes been credited with coining the phrase "It pays to advertise." All this success did not soften her view of the industry, a view which is probably best captured by Whimsey's observation that truth in advertising is like yeast in bread.

But even if Sayers holds the record for individual event, the lifetime achievement award has got to go to the man whom many* consider the best American crime novelist, John D. MacDonald.

Before trying his hand at writing, MacDonald had earned an MBA at Harvard and over his forty year writing career, business and economics remained a prominent part of his fictional universe (one supporting character in the Travis McGee series was an economist who lived on a boat called the John Maynard Keynes). But it was in some of the non-series books that MacDonald's background moved to the foreground.

Real estate frequently figured in MacDonald's plots (not that surprising given given their Florida/Redneck Riviera settings). His last book, Barrier Island, was built around a plan to work federal regulations and creative accounting to turn a profit from the cancellation of a wildly overvalued project. In Condominium, sleazy developers dodge environmental regulations and building codes (which turned out to be a particularly bad idea in a hurricane-prone area).

Real estate also figures MacDonald's examination of televangelism, One More Sunday, as does almost every aspect of an Oral Roberts scale enterprise, HR, security, public relations, lobbying, broadcasting and most importantly fund-raising. It's a complete, realistic, insightful picture. You can find companies launched with less detailed business plans.

But MacDonald's best book on business may be A Key to the Suite, a brief and exceedingly bitter account of a management consultant deciding the future of various executives at a sales convention. Suite was published as a Gold Medal Original paperback in 1962. You could find a surprising amount of social commentary in those drugstore book racks, usually packaged with lots of cleavage.


* One example of many:

“To diggers a thousand years from now, the works of John D. MacDonald would be a treasure on the order of the tomb of Tutankhamen.” - KURT VONNEGUT

Sunday, March 28, 2010

All Cretans are ad execs



This ad reminded of the Liar's Paradox. Not exactly the same thing, but the juxtaposition of messages -- romanticized images of cars brainwash you into desiring hollow status symbols/look at the romanticized images of our cars -- certainly plays to the irony-impaired.

Saturday, March 27, 2010

My best subject used to be recess

David Elkind has a good op-ed piece out today on the loss of unstructured playtime in many schools.
One consequence of these changes is the disappearance of what child-development experts call “the culture of childhood.” This culture, which is to be found all over the world, was best documented in its English-language form by the British folklorists Peter and Iona Opie in the 1950s. They cataloged the songs, riddles, jibes and incantations (“step on a crack, break your mother’s back”) that were passed on by oral tradition. Games like marbles, hopscotch and hide and seek date back hundreds of years. The children of each generation adapted these games to their own circumstances.

Yet this culture has disappeared almost overnight, and not just in America. For example, in the 1970s a Japanese photographer, Keiki Haginoya, undertook what was to be a lifelong project to compile a photo documentary of children’s play on the streets of Tokyo. He gave up the project in 1996, noting that the spontaneous play and laughter that once filled the city’s streets, alleys and vacant lots had utterly vanished.

For children in past eras, participating in the culture of childhood was a socializing process. They learned to settle their own quarrels, to make and break their own rules, and to respect the rights of others. They learned that friends could be mean as well as kind, and that life was not always fair.

I have some quibbles with the essay and strong objections to a couple of points but most of what Elkind has to say here is valid and important.

The fundamental assumption of all educational debates needs to be that children are naturally curious and creative, that evolution has programmed them to learn and explore. Strategies that do a good job capitalizing on that curiosity and creativity will be successful and sometimes the best way to do that is to simply get out of the kids' way.

Friday, March 26, 2010

Another reminder that improbable events are probable

From Jonathan Chait:

Brian Kalt, a law professor and former college classmate of mine, has developed his own law of presidential facial hair:

I thought you might be interested in the following ironclad law of American presidential politics. I call it Kalt’s Law: “Under the modern two-party system, if a candidate has facial hair, the Republican always has as much, or more, than the Democrat.”

Excellent primer on the economics of genre fiction.

In the introduction to Science Fiction by Gaslight, Sam Moskowitz does a really good job explaining how changes in publishing led to the creation of most of today's popular fiction genres. It's an interesting book if you can find a copy.

I'll try to tie this in with the thriller thread (see here and here) in an upcoming post.

Thursday, March 25, 2010

Advice from Andrew Gelman

Whom I always defer to on non-literary matters:

They also recommend composite end points (see page 418 of the above-linked article), which is a point that Jennifer and I emphasize in chapter 4 of our book and which comes up all the time, over and over in my applied research and consulting. If I had to come up with one statistical tip that would be most useful to you--that is, good advice that's easy to apply and which you might not already know--it would be to use transformations. Log, square-root, etc.--yes, all that, but more! I'm talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something continuous (those "total scores" that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don't care if the threshold is "clinically relevant" or whatever--just don't do it. If you gotta discretize, for Christ's sake break the variable into 3 categories.

This all seems quite obvious but people don't know about it. What gives? I have a theory, which goes like this. People are trained to run regressions "out of the box," not touching their data at all. Why? For two reasons:

1. Touching your data before analysis seems like cheating. If you do your analysis blind (perhaps not even hanging your variable names or converting them from ALL CAPS), then you can't cheat.

2. In classical (non-Bayesian) statistics, linear transformations on the predictors have no effect on inferences for linear regression or generalized linear models. When you're learning applied statistics from a classical perspective, transformations tend to get downplayed, and they are considered as little more than tricks to approximate a normal error term (and the error term, as we discuss in our book, is generally the least important part of a model).Once you take a Bayesian approach, however, and think of your coefficients as not being mathematical abstractions but actually having some meaning, you move naturally into model building and transformations.

I don't know if I entirely buy point 2. I'm generally a frequentist and I make extensive use of transformations (though none of them are linear transformations).

Wednesday, March 24, 2010

Fighting words from Andrew Gelman

Or at least a fighting summary of someone else's...

[I've got a meeting coming up so this will have to be quick and ugly and leave lots of plot threads dangling for the sequel]

From Andrew's reaction to Triumph of the Thriller by Patrick Anderson:

Anderson doesn't really offer any systematic thoughts on all this, beyond suggesting that a higher quality of talent goes into thriller writing than before. He writes that, 50 or 70 years ago, if you were an ambitious young writer, you might want to write like Hemingway or Fitzgerald or Salinger (if you sought literary greatness with the possibility of bestsellerdom too) or like James Michener, or Herman Wouk (if you sought fame and fortune with the possibility of some depth as well) or like Harold Robbins or Irving Wallace (if you wanted to make a business out of your writing). But the topselling authors of mysteries were really another world entirely--even though their books were ubiquitous in drugstore and bus-station bookracks, and even occasionally made their way onto the bestseller lists, they barely overlapped with serious fiction, or with bestselling commercial fiction.

Nowadays, though, a young writer seeking fame and fortune (or, at least, a level of financial security allowing him to write and publish what he wants) might be drawn to the thriller, Anderson argues, for its literary as well as commercial potential. At the very least, why aim to be a modern-day Robbins or Michener if instead you can follow the footsteps of Scott Turow. And not just as a crime novelist, but as a writer of series: "Today, a young novelist with my [Anderson's] journalistic knack for action and dialogue would be drawn to a crime series; if not, his publisher would push him in that direction."

1. I'd argue (and I think most literary historians would back me up) that in terms of literary quality, crime fiction was at its best from about the time Hammet started writing for Black Mask to either the Fifties or Sixties, a period that featured: Chandler; Ross and John D. MacDonald; Jim Thompson; Ed McBain; Donald Westlake; Joe Gores; Lawrence Block* and a slew of worthies currently being reprinted by Hard Case.

2. Crime writing was fairly respected at the time. Check out contemporary reviews (particularly by Dorothy Parker). It was even possible for Marquand to win a Pulitzer for a "serious" novel while writing the Mr. Moto books.

3. There is an economic explanation for both the drop in quality and the surge in sales, but that will have to wait. I have a meeting at one of the studios and I need to go buy a pair of sunglasses.


*Those last three did their best work more recently but they were a product of the pulps.

p.s. Here's an illustrative passage from the NYT on the literary respect a mystery writer might achieve back before thrillers were the dominant genre:

Ross Macdonald's appeal and importance extended beyond the mystery field. He was seen as an important California author, a novelist who evoked his region as tellingly as such mainstream writers as Nathanael West and Joan Didion. Before he died, Macdonald was given the Los Angeles Times's Robert Kirsch Award for a distinguished body of work about the West. Some critics ranked him among the best American novelists of his generation.

By any standard he was remarkable. His first books, patterned on Hammett and Chandler, were at once vivid chronicles of a postwar California and elaborate retellings of Greek and other classic myths. Gradually he swapped the hard-boiled trappings for more subjective themes: personal identity, the family secret, the family scapegoat, the childhood trauma; how men and women need and battle each other, how the buried past rises like a skeleton to confront the present. He brought the tragic drama of Freud and the psychology of Sophocles to detective stories, and his prose flashed with poetic imagery. By the time of his commercial breakthrough, some of Macdonald's concerns (the breakdown between generations, the fragility of moral and global ecologies) held special resonance for a country divided by an unpopular war and alarmed for the environment. His vision was strong enough to spill into real life, where a news story or a friend's revelation could prompt the comment "Just like a Ross Macdonald novel."

It was a vision with meaning for all sorts of readers. Macdonald got fan mail from soldiers, professors, teenagers, movie directors, ministers, housewives, poets. He was claimed as a colleague by good writers around the world, including Eudora Welty, Andrey Voznesensky, Elizabeth Bowen, Thomas Berger, Marshall McLuhan, Margaret Laurence, Osvaldo Soriano, Hugh Kenner, Nelson Algren, Donald Davie, and Reynolds Price.

Assumptions

We always talk about how hard it is to actually try and verify the assumptions required for missing data techniques to yield unbiased answers. Still, it really is a breath of fresh air when somebody tries to give some (data driven) guidance on whether or not an assumption really is reasonable. That was the case with a recent PDS article:

Marston L, Carpenter JR, Walters KR, Morris RW, Nazareth I, Petersen I. Issues in multiple imputation of missing data for large general practice clinical databases. Pharmacoepidemiol Drug Saf 2010 (currently an epub)

They nicely make the case that blood pressure data is likely to be missing at random in these databases. Given my thoughts that BP data is underused, this is actually a pretty major advance as it allows more confidence in inferences from these large clinical databases.

Good show, folks!

Tuesday, March 23, 2010

More questions about the statistics of Freakonomics

Felix Salmon is on the case:

There’s a nice empirical post-script to the debate over the economic effects of classifying the Spotted Owl as an endangered species. Freakonomics cites a study putting the effect at $46 billion, but others, including John Berry, who wrote a story on the subject for the Washington Post, think it’s much closer to zero.

And now it seems the Berry side of the argument has some good Freakonomics-style panel OLS regression analysis of the microeconomy of the Pacific Northwest to back up its side of the argument. A new paper by Annabel Kirschner finds that unemployment in the region didn’t go up when the timber industry improved, and it didn’t go down when the timber industry declined — not after you adjust for much more obvious things like the presence of minorities in the area.

Comparing Apples and furniture in a box



In a previous post on branding, I used Apple as an example of a company that, because of its brand, can charge a substantial premium for its high-quality products. In this New Yorker post, James Surowiecki compares Apple to companies that take the opposite approach.

For Apple, which has enjoyed enormous success in recent years, “build it and they will pay” is business as usual. But it’s not a universal business truth. On the contrary, companies like Ikea, H. & M., and the makers of the Flip video camera are flourishing not by selling products or services that are “far better” than anyone else’s but by selling things that aren’t bad and cost a lot less. These products are much better than the cheap stuff you used to buy at Woolworth, and they tend to be appealingly styled, but, unlike Apple, the companies aren’t trying to build the best mousetrap out there. Instead, they’re engaged in what Wired recently christened the “good-enough revolution.” For them, the key to success isn’t excellence. It’s well-priced adequacy.

These two strategies may look completely different, but they have one crucial thing in common: they don’t target the amorphous blob of consumers who make up the middle of the market. Paradoxically, ignoring these people has turned out to be a great way of getting lots of customers, because, in many businesses, high- and low-end producers are taking more and more of the market. In fashion, both H. & M. and Hermès have prospered during the recession. In the auto industry, luxury-car sales, though initially hurt by the downturn, are reemerging as one of the most profitable segments of the market, even as small cars like the Ford Focus are luring consumers into showrooms. And, in the computer business, the Taiwanese company Acer has become a dominant player by making cheap, reasonably good laptops—the reverse of Apple’s premium-price approach.

Monday, March 22, 2010

True models?

The p-value discussion started by an arcile authored by Tom Siegfried has generatesd a lot of discussion. Andrew Gelman has tried to round up many of the discussion points.

But the best part of the post (besides showing the diversity out there) was hidden at the bottom. Andrew comments:

"In all the settings I've ever worked on, the probability that the model is true is . . . zero!"

Well, he is most certainly correct in pharamcoepidemiology as well. I see a lot of variation over how to handle the biases that are inherent in observational pharmacoepidemiology -- but the focus on randomized drug trials should be a major clue that these associations are tricky to model. As a point of fact, the issue of confounding by indication, channeling bias, indication bias or whatever else you want to call it is central to the field. And the underlying idea here is that we can't get enough information about participants to model the influence of drugs being channeled to sicker patients.

So I wish that, in my field as well, people would realize that the relationships are tricky and no model is ever going to be absolutely correctly specified.

The curse of large numbers and the real problem with p-values

(Some final thoughts on statistical significance)

The real problem with p-values isn't just that people want it to do something that it can't do; they want it to do something that no single number can ever do, fully describe the quality and reliability of an experiment or study. This simply isn't one of those mathematical beasts that can be reduced to a scalar. If you try then sooner or later you will inevitably run into a situation where you get the same metric for two tests of widely different quality.

Which leads me to the curse of large numbers. Those you who are familiar with statistics (i.e. pretty much everybody who reads this blog) might want to skip the next paragraph because this goes all the way back to stat 101.

Let's take simplest case we can. You want to show that the mean of some group is positive so you take a random sample and calculate the probability of getting the results you saw or something more extreme (the probability of getting exactly results you saw is pretty much zero) working under the assumption that the mean of the group was actually zero. This works because the bigger the samples you take the more the means of those samples will tend to follow a nice smooth bell curve and the closer those means will tend to group around the mean of the group you're sampling from.

(For any teachers out there, a good way of introducing the central limit theorem is to have students simulate coin flips with Excel then make histograms based on various sample sizes.)

You might think of sampling error as the average difference between the mean of the group you're interested in and the mean of the samples you take from it (that's not exactly what it means but it's close) . The bigger the sample the smaller you expect that error to be which makes sense. If you picked three people at random, you might get three tall people or three millionaires, but if you pick twenty people at random, the chances of getting twenty tall people or twenty millionaires is virtually are next to nothing.

The trouble is that sampling error is only one of the things a statistician has to worry about. The sampled population might not reflect the population you want to draw inferences about. Your sample might not be random. Data may not be accurately entered. There may be problems with aliasing and confounding. Independence assumptions may be violated. With respect to sample size, the biases associated with these problems are all fixed quantities. A big sample does absolutely nothing to address them.

There's an old joke about a statistician who wakes up to find his room on fire, says to himself "I need more observations" and goes back to sleep. We do spend a lot of our time pushing for more data (and, some would say, whining about not having enough), but we do that not because small sample sizes are the root of all of our problems but because they are the easiest problem to fix.

Of course "fix" as used here is an asymptotic concept and the asymptote is not zero. Even an infinite sample wouldn't result in a perfect study; you would still be left with all of the flaws and biases that are an inevitable part of all research no matter how well thought out and executed it may be.

This is a particular concern for the corporate statistician who often encounters the combination of large samples and low quality data. It's not unusual to see analyses done on tens or even hundreds of thousands of sales or customer records and more often than not, when the results are presented someone will point to the nano-scale p-value as an indication of the quality and reliability of the findings.

As far as I know, no one reviewing for a serious journal would think that p<0.001 means that we're 99.9% sure that a conclusion is true, but that's what almost everyone without an analytic background thinks.

And that is a problem.

Sunday, March 21, 2010

Silence

I'm in the process of moving from one corner of the United States to the other. Blogging may be extremely light for the next 2 weeks.

Apologies in advance!

Interesting variable taxation idea from Thoma

From Economist's View:
Political battles make it very difficult to use discretionary fiscal policy to fight a recession, so more automatic stabilizers are needed. Along those lines, if something like this were to be implemented to stabilize the economy over the business cycle, I'd prefer to do this more generally, i.e. allow income taxes, payroll taxes, etc. to vary procyclically. That is, these taxes would be lower in bad times and higher when things improve, and implemented through an automatic moving average type of rule that produces the same revenue as some target constant tax rate (e.g. existing rates).

Saturday, March 20, 2010

Friday, March 19, 2010

Too late for an actual post, but...

There are another couple of entries in the TNR education debate. If you're an early riser you can read them before I do.

Thursday, March 18, 2010

Some more thoughts on p-value

One of the advantages of being a corporate statistician was that generally you not only ran the test; you also explained the statistics. I could tell the department head or VP that a p-value of 0.08 wasn't bad for a preliminary study with a small sample, or that a p-value of 0.04 wasn't that impressive with a controlled study of a thousand customers. I could factor in things like implementation costs and potential returns when looking at type-I and type-II errors. For low implementation/high returns, I might set significance at 0.1. If the situation were reversed, I might set it at 0.01.

Obviously, we can't let everyone set their own rules, but (to coin a phrase) I wonder if in an effort to make things as simple as possible, we haven't actually made them simpler. Statistical significance is an arbitrary, context-sensitive cut-off that we assign before a test based on the relative costs of a false positive and a false negative. It is not a God-given value of 5%.
Letting everyone pick their own definition of significance is a bad idea but so is completely ignoring context. Does it make any sense to demand the same level of p-value from a study of a rare, slow-growing cancer (where five-years is quick and a sample size of 20 is an achievement) and a drug to reduce BP in the moderately obese (where a course of treatment lasts two week and the streets are filled with potential test subjects)? Should we ignore a promising preliminary study because it comes in at 0.06?

For a real-life example, consider the public reaction to the recent statement that we didn't have statistically significant data that the earth had warmed over the past 15 years. This was a small sample and I'm under the impression that the results would have been significant at the 0.1 level, but these points were lost (or discarded) in most of the coverage.

We need to do a better job dealing with these grays. We might try replacing the phrase "statistically significant" with "statistically significant at 10/5/1/0.1%." Or we might look at some sort of a two-tiered system, raising significance to 0.01 for most studies while making room for "provisionally significant" papers where research is badly needed, adequate samples are not available, or the costs of a type-II error are deemed unusually high.

I'm not sure how practical or effective these steps might be but I am sure we can do better. Statisticians know how to deal with gray areas; now we need to work on how we explain them.

For more on the subject, check out Joseph's posts here and here.

The winner's curse

I have heard about the article that Mark references in a previous post; it's hard to be in the epidemiology field and not have heard about it. But, for this post, I want to focus on a single aspect of the problem.

Let's say that you have a rare side effect that requires a large database to find and, even then, the power is limited. Let's say, for an illustration, that the true effect of a drug on an outcome is an Odds Ratio (or Relative Risk, it's a rare disease) of 1.50. If, by chance alone, the estimate in database A is 1.45 (95% Confidence interval: 0.99 to 1.98) and the estimate in database B is 1.55 (95% CI: 1.03 to 2.08) the what would be the result of two studies on this side effect?

Well, if database A is done first then maybe nobody ever looks at database B (these databases are often expensive to use and time consuming to analyze). If database B is used first, the second estimate will be from database A (and thus lower). In fact, there is some chance that the researchers from database A will never publish (as it has been historically the case that null results are hard to publish).

The result? Estimates of association between the drug and the outcome will tend to be biased upwards -- because the initial finding (due to the nature of null results being hard to publish) will tend to be an over-estimate of the true causal effect.

These factors make it hard to determine if a meta-analysis of observational evidence would give an asymptotically unbiased estimate of the "truth" (likely it would be biased upwards).

In that sense, on average, published results are biased to some extent.

A lot to discuss

When you get past the inflammatory opening, this article in Science News is something you should take a look at (via Felix Salmon).
“There is increasing concern,” declared epidemiologist John Ioannidis in a highly cited 2005 paper in PLoS Medicine, “that in modern research, false findings may be the majority or even the vast majority of published research claims.”

Ioannidis claimed to prove that more than half of published findings are false, but his analysis came under fire for statistical shortcomings of its own. “It may be true, but he didn’t prove it,” says biostatistician Steven Goodman of the Johns Hopkins University School of Public Health. On the other hand, says Goodman, the basic message stands. “There are more false claims made in the medical literature than anybody appreciates,” he says. “There’s no question about that.”

Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical. “A lot of scientists don’t understand statistics,” says Goodman. “And they don’t understand statistics because the statistics don’t make sense.”

Wednesday, March 17, 2010

Evidence

I was reading Andrew Gelman (always a source of interesting statistical thoughts) and I started thinking about p-values in epidemiology.

Is there a measure in all of medical research more controversial than the p-value? Sometimes I really don't think so. In a lot of ways, it seems to dominate research just because it has become an informal standard. But it felt odd, the one time I did it, to say in a paper that there was no association (p=.0508) when adding a few more cases might have flipped the answer.

I don't think confidence intervals, used in the sense of "does this interval include the null", really advance the issue either. But it's true that we do want a simple way to decide if we should be concerned about a possible adverse association and the medical literature is not well constructed for a complex back and through discussion about statistical models.

I'm also not convinced that any "standard of evidence" would not be similarly misapplied. Any approach that is primarily used by trained statisticians (sensitive to it's limitations) will look good compared with a broad standard that is also applied by non-specialists.

So I guess I don't see an easy way to replace our reliance on p-values in the medical literature, but it is worth some thought.

"We could call them 'universities'"

This bit from the from Kevin Carey's entry into the New Republic Debate caught my eye:

In the end, [Diane Ravitch's] Death and Life is painfully short on non-curricular ideas that might actually improve education for those who need it most. The last few pages contain nothing but generalities: ... "Teachers must be well educated and know their subjects." That's all on page 238. The complete lack of engagement with how to do these things is striking.

If only there were a system of institutions where teachers could go for instruction in their fields. If there were such a system then Dr. Ravitch could say "Teachers must be well educated and know their subjects" and all reasonable people would assume that she meant we should require teachers to take more advanced courses and provide additional compensation for those who exceeded those requirements.

Tuesday, March 16, 2010

Some context on schools and the magic of the markets

One reason emotions run so hot in the current debate is that the always heated controversies of education have somehow become intertwined with sensitive points of economic philosophy. The discussion over child welfare and opportunity has been rewritten as an epic struggle between big government and unions on one hand and markets and entrepreneurs on the other. (insert Lord of the Rings reference here)

When Ben Wildavsky said "Perhaps most striking to me as I read Death and Life was Ravitch’s odd aversion to, even contempt for, market economics and business as they relate to education" he wasn't wasting his time on a minor aspect of the book; he was focusing on the fundamental principle of the debate.

The success or even the applicability of business metrics and mission statements in education is a topic for another post, but the subject does remind me of a presentation the head of the education department gave when I was getting my certification in the late Eighties. He showed us a video of Tom Peter's discussing In Search of Excellence then spent about an hour extolling Peters ideas.

(on a related note, I don't recall any of my education classes mentioning George Polya)

I can't say exactly when but by 1987 business-based approaches were the big thing in education and had been for quite a while, a movement that led to the introduction of charter schools at the end of the decade. And the movement has continued to this day.

In other words, American schools have been trying a free market/business school approach for between twenty-five and thirty years.

I'm not going to say anything here about the success or failure of those efforts, but it is worth putting in context.

Monday, March 15, 2010

And for today, at least, you are not the world's biggest math nerd

From Greg Mankiw:
Fun fact of the day: MIT releases its undergraduate admission decisions at 1:59 pm today. (That is, at 3.14159).

Who is this Thomas Jefferson you keep talking about?

I've got some posts coming up on the role curriculum plays in educational reform. In the meantime, check out what's happening in Texas* with the state board of education. Since the Lone Star state is such a big market they have a history of setting textbook content for the nation.

Here's the change that really caught my eye:
Thomas Jefferson no longer included among writers influencing the nation’s intellectual origins. Jefferson, a deist who helped pioneer the legal theory of the separation of church and state, is not a model founder in the board’s judgment. Among the intellectual forerunners to be highlighted in Jefferson’s place: medieval Catholic philosopher St. Thomas Aquinas, Puritan theologian John Calvin and conservative British law scholar William Blackstone. Heavy emphasis is also to be placed on the founding fathers having been guided by strict Christian beliefs.
* I'm a Texan by birth. I'm allowed to mock.

Observational Research

An interesting critique of observational data by John Cook. I think that the author raises an interesting point but that it is more true of cross-sectional studies than longitudinal ones. If you have a baseline modifiable factor and look at the predictors of change then you have a pretty useful measure of consequence. It might be confounded or it might have issues with indication bias, but it's still a pretty interesting prediction.

With cross sectional studies, on the other hand, reverse causality is always a concern.

Of course, the other trick is that the risk factor really has to be modifiable. Drugs (my own favorite example) often are. But even diet and exercise get tricky to modify when you look at them closely (as they are linked to other characteristics of the individual and are a very drastic change in lifestyle patterns).

It's a hard area and this is is why we use experiments as our gold standard!

"The Obesity-Hunger Paradox"

Interesting article from the New York Times:

WHEN most people think of hunger in America, the images that leap to mind are of ragged toddlers in Appalachia or rail-thin children in dingy apartments reaching for empty bottles of milk.

Once, maybe.

But a recent survey found that the most severe hunger-related problems in the nation are in the South Bronx, long one of the country’s capitals of obesity. Experts say these are not parallel problems persisting in side-by-side neighborhoods, but plagues often seen in the same households, even the same person: the hungriest people in America today, statistically speaking, may well be not sickly skinny, but excessively fat.

Call it the Bronx Paradox.

“Hunger and obesity are often flip sides to the same malnutrition coin,” said Joel Berg, executive director of the New York City Coalition Against Hunger. “Hunger is certainly almost an exclusive symptom of poverty. And extra obesity is one of the symptoms of poverty.”

The Bronx has the city’s highest rate of obesity, with residents facing an estimated 85 percent higher risk of being obese than people in Manhattan, according to Andrew G. Rundle, an epidemiologist at the Mailman School of Public Health at Columbia University.

But the Bronx also faces stubborn hunger problems. According to a survey released in January by the Food Research and Action Center, an antihunger group, nearly 37 percent of residents in the 16th Congressional District, which encompasses the South Bronx, said they lacked money to buy food at some point in the past 12 months. That is more than any other Congressional district in the country and twice the national average, 18.5 percent, in the fourth quarter of 2009.

Such studies present a different way to look at hunger: not starving, but “food insecure,” as the researchers call it (the Department of Agriculture in 2006 stopped using the word “hunger” in its reports). This might mean simply being unable to afford the basics, unable to get to the grocery or unable to find fresh produce among the pizza shops, doughnut stores and fried-everything restaurants of East Fordham Road.

"The economics profession is in crisis"

This may sound strange but all this soul searching by economists like Mark Thoma makes me think that the field might be on the verge of extensive reassessment and major advances.

From the Economist's View:
The fact that the evidence always seems to confirm ideological biases doesn't give much confidence. Even among the economists that I trust to be as fair as they can be -- who simply want the truth whatever it might be (which is most of them) -- there doesn't seem to be anything resembling convergence on this issue. In my most pessimistic moments, I wonder if we will ever make progress, particularly since there seems to be a tendency for the explanation given by those who are most powerful in the profession to stick just because they said it. So long as there is some supporting evidence for their positions, evidence pointing in other directions doesn't seem to matter.

The economics profession is in crisis, more so than the leaders in the profession seem to understand (since change might upset their powerful positions, positions that allow them to control the academic discourse by, say, promoting one area of research or class of models over another, they have little incentive to see this). If, as a profession, we can't come to an evidence based consensus on what caused the single most important economic event in recent memory, then what do we have to offer beyond useless "on the one, on the many other hands" explanations that allow people to pick and choose according to their ideological leanings? We need to do better.

(forgot to block-quote this. sorry about the error)

TNR on the education debate

The New Republic is starting a series on education reform. Given the extraordinary quality of commentary we've been seeing from TNR, this is definitely a good development.

Here are the first three entries:

By Diane Ravitch: The country's love affair with standardized testing and charter schools is ruining American education.

By Ben Wildavsky: Why Diane Ravitch's populist rage against business-minded school reform doesn't make sense.

By Richard Rothstein: Ravitch’s recent ‘conversion’ is actually a return to her core values.

Sunday, March 14, 2010

Harlem Children's Zero Sum Game

I used to work in the marketing side of large corporation (I don't think they'd like me to use their name so let's just say you've heard of it and leave the matter at that). We frequently discussed the dangers of adverse selection: the possibility that a marketing campaign might bring in customers we didn't want, particularly those we couldn't legally refuse. We also spent a lot of time talking about how to maximize the ratio of perceived value to real value.

On a completely unrelated note, here's an interesting article from the New York Times:
Pressed by Charters, Public Schools Try Marketing
By JENNIFER MEDINA

Rafaela Espinal held her first poolside chat last summer, offering cheese, crackers and apple cider to draw people to hear her pitch.

She keeps a handful of brochures in her purse, and also gives a few to her daughter before she leaves for school each morning. She painted signs on the windows of her Chrysler minivan, turning it into a mobile advertisement.

It is all an effort to build awareness for her product, which is not new, but is in need of an image makeover: a public school in Harlem.

As charter schools have grown around the country, both in number and in popularity, public school principals like Ms. Espinal are being forced to compete for bodies or risk having their schools closed. So among their many challenges, some of these principals, who had never given much thought to attracting students, have been spending considerable time toiling over ways to market their schools. They are revamping school logos, encouraging students and teachers to wear T-shirts emblazoned with the new designs. They emphasize their after-school programs as an alternative to the extended days at many charter schools. A few have worked with professional marketing firms to create sophisticated Web sites and blogs.
...

For most schools, the marketing amounts to less than $500, raised by parents and teachers to print up full color postcards or brochures. Typically, principals rely on staff members with a creative bent to draw up whatever they can.

Student recruitment has always been necessary for charter schools, which are privately run but receive public money based on their enrollment, supplemented by whatever private donations they can corral.

The Harlem Success Academy network, run by the former City Council member Eva Moskowitz, is widely regarded, with admiration by some and scorn by others, as having the biggest marketing effort. Their bright orange advertisements pepper the bus stops in the neighborhood, and prospective parents receive full color mailings almost monthly.

Ms. Moskowitz said the extensive outreach was necessary to make sure they were drawing from a broad spectrum of parents. Ms. Moskowitz said they spent roughly $90 per applicant for recruitment. With about 3,600 applicants last year for the four schools in the network, she said, the total amounted to $325,000.

Saturday, March 13, 2010

Social norms and happy employees

I came accross the following from from Jay Golz's New York Times blog:

About 10 years ago I was having my annual holiday party, and my niece had come with her newly minted M.B.A. boyfriend. As he looked around the room, he noted that my employees seemed happy. I told him that I thought they were.

Then, figuring I would take his new degree for a test drive, I asked him how he thought I did that. “I’m sure you treat them well,” he replied.

“That’s half of it,” I said. “Do you know what the other half is?”

He didn’t have the answer, and neither have the many other people that I have told this story. So what is the answer? I fired the unhappy people. People usually laugh at this point. I wish I were kidding.

In my experience, it is generally unhappy employees who say things like "But what happens to our business model if home prices go down?" or "Doesn't that look kinda like an iceberg?" Putting that aside, though, this is another example of the principle discussed in the last post -- it's easy to get the norms you want if you can decide who goes in the group.

Charter schools, social norming and zero-sum games

You've probably heard about the Harlem Children's Zone, an impressive, even inspiring initiative to improve the lives of poor inner-city children through charter schools and community programs. Having taught in Watts and the Mississippi Delta in my pre-statistician days, this is an area of long-standing interest to me and I like a lot of the things I'm hearing about HCZ. What I don't like nearly as much is the reaction I'm seeing to the research study by Will Dobbie and Roland G. Fryer, Jr. of Harvard. Here's Alex Tabarrok at Marginal Revolution with a representative sample, "I don't know why anyone interested in the welfare of children would want to discourage this kind of experimentation."

Maybe I can provide some reasons.

First off, this is an observational study, not a randomized experiment. I think we may be reaching the limits of what analysis of observational data can do in the education debate and, given the importance and complexity of the questions, I don't understand why we aren't employing randomized trials to answer some of these questions once and for all.

More significantly I'm also troubled by the aliasing of data on the Promise Academies and by the fact that the authors draw a conclusion ("HCZ is enormously successful at boosting achievement in math and ELA in elementary school and math in middle school. The impact of being offered admission into the HCZ middle school on ELA achievement is positive, but less dramatic. High-quality schools or community investments coupled with high-quality schools drive these results, but community investments alone cannot.") that the data can't support.

In statistics, aliasing means combining treatments in such a way that you can't tell which treatment or combination of treatments caused the effect you observed. In this case the first treatment is the educational environment of the Promise Academies. The second is something called social norming.

When you isolate a group of students, they will quickly arrive at a consensus of what constitutes normal behavior. It is a complex and somewhat unpredictable process driven by personalities and random connections and any number of outside factors. You can however, exercise a great deal of control over the outcome by restricting the make-up of the group.

If we restricted students via an application process, how would we expect that group to differ from the general population and how would that affect the norms the group would settle on? For starters, all the parents would have taken a direct interest in their children's schooling.

Compared to the general population, the applicants will be much more likely to see working hard, making good grades, not getting into trouble as normal behaviors. The applicants (particularly older applicants) would be more likely to be interested in school and to see academic and professional success as a reasonable possibility because they would have made an active choice to move to a new and more demanding school. Having the older students committed to the program is particularly important because older children play a disproportionate role in the setting of social norms.

Dobbie and Fryer address the question of self-selection, "[R]esults from any lottery sample may lack external validity. The counterfactual we identify is for students who are already interested in charter schools. The effect of being offered admission to HCZ for these students may be different than for other types of students." In other words, they can't conclude from the data how well students would do at the Promise Academies if, for instance, their parents weren't engaged and supportive (a group effective eliminated by the application process).

But there's another question, one with tremendous policy implications, that the paper does not address: how well would the students who were accepted to HCZ have done if they were given the same amount of instruction * as they would have received from HCZ using public school teachers while being isolated from the general population? (There was a control group of lottery losers but there is no evidence that they were kept together as a group.)

Why is this question so important? Because we are thinking about spending an enormous amount of time, effort and money on a major overhaul of the education system when we don't have the data to tell us if what we'll spend will wasted or, worse yet, if we are to some extent playing a zero sum game.

Social norming can work both ways. If you remove all of the students whose parents are willing and able to go through the application process, the norms of acceptable behavior for those left behind will move in an ugly direction and the kids who started out with the greatest disadvantages would be left to bear the burden.

But we can answer these questions and make decisions based on solid, statistically sound data. Educational reform is not like climate change where observational data is our only reasonable option. Randomized trials are an option in most cases; they are not that difficult or expensive.

Until we get good data, how can we expect to make good decisions?

* Correction: There should have been a link here to this post by Andrew Gelman.

Friday, March 12, 2010

Instrumental variables

I always have mixed feelings about instrumental variables (at least insofar as the instrument is not randomization). On one hand they show amazing promise as a way to handle unmeasured confounding. On the other hand, it is difficult to know if the assumptions required for a variable to be an instrument are being met or not.

This is a important dilemma. Alan Brookhart, who introduced them into phamracoepidemiology in 2006, has done an amazing job of proving out one example. But you can't generalize from one example and the general idea of using physician preference as an instrument, while really cool, suffers from these assumptions.

Unlike unmeasured confounders, it's hard to know how to test this. With unmeasured confounders you can ask critics to specify what they suspect might be the key confounding factors and go forth and measure them. But instruments are used precisely when there is a lack of data.

I've done some work in the area with some amazing colleagues and I still think that the idea has some real promise. It's a novel idea that really came out of left field and has enormous potential. But I want to understand it in far more actual cases before I conclude much more . . .

Thursday, March 11, 2010

Propensity Score Calibration

I am on the road giving a guest lecture at UBC today. One of the topics I was going to cover in today's discussion was propensity score calibration (by the ever brilliant Til Sturmer). But I wonder -- if you have a true random subset of the overall population -- why not just use it? Or, if as Til assumes, the sample is too small, why not use multiple imputation? Wouldn't that be an equivalent technique that is more flexible for things like sub group analysis?

Or is it the complexity of the imputation in data sets of the size Til worked with that was the issue? It's certainly a point to ponder.

Worse than we thought -- credit card edition

For a while it looked like the one good thing about the economic downturn was that it was getting people to pay down their credit card debts. Now, according to Felix Salmon, we may have to find another silver lining:

Total credit-card debt outstanding dropped by $93 billion, or almost 10%, over the course of 2009. Is that cause for celebration, and evidence that U.S. households are finally getting their act together when it comes to deleveraging their personal finances? No. A fascinating spreadsheet from CardHub breaks that number down by looking at two variables: time, on the one hand, and charge-offs, on the other.

It turns out that while total debt outstanding dropped by $93 billion, charge-offs added up to $83 billion — which means that only 10% of the decrease in credit card debt — less than $10 billion — was due to people actually paying down their balances.

Tuesday, March 9, 2010

Perils of Convergence

This article ("Building the Better Teacher") in the New York Times Magazine is generating a lot of blog posts about education reform and talk of education reform always makes me deeply nervous. Part of the anxiety comes having spent a number of years behind the podium and having seen the disparity between the claims and the reality of previous reforms. The rest comes from being a statistician and knowing what things like convergence can do to data.

Convergent behavior violates the assumption of independent observations used in most simple analyses, but educational studies commonly, perhaps even routinely ignore the complex ways that social norming can cause the nesting of student performance data.

In other words, educational research is often based of the idea that teenagers do not respond to peer pressure.

Since most teenagers are looking for someone else to take the lead, social norming can be extremely sensitive to small changes in initial conditions, particularly in the make-up of the group. This makes it easy for administrators to play favorites -- when a disruptive or under-performing student is reassigned from a favored to an unfavored teacher, the student lowers the average of the second class and often resets the standards of normal behavior for his or her peers.

If we were to adopt the proposed Jack-Welch model (big financial incentitves at the top; pink slips at the bottom), an administrator could, just by moving three or four students, arrange for one teacher to be put in line for for achievement bonuses while another teacher of equal ability would be in danger of dismissal.

Worse yet, social norming can greatly magnify the bias caused by self-selection and self-selection biases are rampant in educational research. Any kind of application process automatically removes almost all of the students that either don't want to go to school or aren't interested in academic achievement or know that their parents won't care what they do.

If you can get a class consisting entirely of ambitious, engaged students with supportive parents, social norming is your best friend. These classes are almost (but not quite) idiot proof and teachers lucky enough to have these classes will see their metrics go through the roof (and their stress levels plummet -- those are fun classes to teach). If you can get an entire school filled with these students, the effect will be even stronger.

This effect is often stated in terms of the difference in performance between the charter schools and the schools the charter students were drawn from which adds another level of bias (not to mention insult to injury).

Ethically, this raises a number of tough questions about our obligations to all students (even the difficult and at-risk) and what kind of sacrifices we can reasonably ask most students to make for a few of their peers.

Statistically, though, the situation is remarkably clear: if this effect is present in a study and is not accounted for, the results are at best questionable and at worst meaningless.

(this is the first in a series of posts about education. Later this week, I'll take a look at the errors in the influential paper on Harlem's Promise Academy.)

Efficacy versus effectiveness

One of the better examples that I have found of this distinction is with physical activity. Travis Saunders talks about the difference between a closely monitored exercise program and encouraging exercise related behavior (despite randomization).

This should be a warning for those of us in drug research as well; not even randomization will help if you have a lot of cross-overs over time or if user tend to alter other behaviors as a result of therapy. This isn't very plausible with some drugs with few side effects (statins) but could be really important for others where the effects can alter behavior (NSAIDs). In particular, it makes me wonder about our actual ability to use randomized experiments of pain medication for arthritis (except, possibly, in the context of comparative effectiveness).

But it is worth thinking about when trying to interpret observational data. What else could you be missing?

Monday, March 8, 2010

Undead papers

Okay, so what do y'all do when a paper becomes undead? We all have work that stopped, for one reason or another, but really needs to be brought to a conclusion. Not even necessarily a happy conclusion (sometimes putting a project out of its misery is the kindest decision for all involved -- especially the junior scientist leading the charge). But sometimes it is the case that the results are just not that compelling (but it still deserves to be published in the journal of minor findings).

But I wonder what is the secret to motivation under these conditions?

Sunday, March 7, 2010

"Algebra in Wonderland" -- recommended with reservations

In today's New York Times, Melanie Bayley, a doctoral candidate in English literature at Oxford, argues that Lewis Carroll's Alice in Wonderland can be interpreted as a satire of mathematics in the mid-Nineteenth Century, particularly the work of Hamilton and De Morgan.

The essay has its share of flaws: none of the analogies are slam-dunk convincing (the claim that the Queen of Hearts represents an irrational number is especially weak); the omission of pertinent works like "A Tangled Tale" and "What the Tortoise Said to Achilles" is a bit strange; and the conclusion that without math, Alice might have been more like Sylvie and Bruno would be easier to take seriously if the latter book hadn't contained significant amounts of mathematics* and intellectual satire.

Those problems aside, it's an interesting piece, a great starting point for discussing mathematics and literature and it will give you an excuse to dig out your Martin Gardner books. Besides, how often do you get to see the word 'quaternion' on the op-ed page?


* including Carroll's ingenious gravity powered train.

Friday, March 5, 2010

When is zero a good approximation

I was commenting on Andrew Gelman's blog when a nice commentator pointed something out that I usually don't think much about: pharmacoepidemiology outcomes include both cost and efficacy.

Now, a lot of my work has been on older drugs (aspirin, warfarin, beta blockers are my three most commonly studied drugs) so I have tended to assume that cost was essentially zero. A years supply of aspirin for $10.00 is an attainable goal and so I have assumed that we can neglect the cost of therapy.

But does that make sense if we are talking a targeted chemotherapy? In such a case, we might have to weight not just the burden of additional adverse events but the cost of the medication itself.

It's becoming appalling clear to me that I don't have a good intuition of how to model this well. Making everything a cost and assuming a price on years of life lost is one approach but the complexity of pricing involved (and the tendency for relative costs to change over time) worried me about external validity.

I know what I will be thinking about this weekend!

Thursday, March 4, 2010

How are genetically engineered crops like AAA rated structured bonds?

Felix Salmon draws a clever analogy:

If you only grow one crop, the downside of losing it all to an outbreak is catastrophe. In rural Iowa it might mean financial ruin; in Niger, it could mean starvation.

Big agriculture companies like DuPont and Archer Daniels Midland (ADM), of course, have an answer to this problem: genetically engineered crops that are resistant to disease. But that answer is the agricultural equivalent of creating triple-A-rated mortgage bonds, fabricated precisely to prevent the problem of credit risk. It doesn’t make the problem go away: It just makes the problem rarer and much more dangerous when it does occur because no one is — or even can be — prepared for such a high-impact, low-probability event.

Valuing Pain

Readers of this blog will know that I have some concerns about the regulation of pain medications. The FDA continues to warn about the issue of liver injury when taking acetaminophen.

For a moment, let's ignore the case of people taking the drug inappropriately or for whom another medication would provide better symptom control. They exist and are relevant to policy discussions, but they distract from today's main thought.

We can measure liver damage and death (hard outcomes). We cannot easily measure pain -- what level of pain relief is worth a 1% chance of death?

So do we leave it up to individual judgment? Drugs can be confusing and acetaminophen (due to efficacy) is included in a lot of preparations (for important reasons). So what is the ideal balance between these two goals (prevent adverse events and relieving pain)?

It would be so much easier if pain were easy to measure . . .

Wednesday, March 3, 2010

p-values

Another nice critique of relying on p-values. There is also a fine example in the comments of why one should double check when they think things look odd. Often it is better to keep one's mouth shut and be thought a fool than to open it and remove all doubt.

Tuesday, March 2, 2010

Comparing Apples and Really Bad Toupees

DISCLAIMER: Though I have worked in some related areas like product launches, I have never done an analysis of brand value. What follows are a few thoughts about branding without any claim of special expertise or insight. If I've gotten something wrong here I would appreciate any notes or corrections.

Joseph's post reminded me of this article in the Wall Street Journal about the dispute between Donald Trump and Carl Icahn over the value of the Trump brand. Trump, not surprisingly, favors the high end:
In court Thursday, Mr. Trump boasted that his brand was recently valued by an outside appraiser at $3 billion.

In an interview Wednesday, Mr. Trump dismissed the idea that financial troubles had tarnished his casino brand. He also dismissed Mr. Icahn's claims that the Trump gaming brand was damaged, pointing to a recent filing in which Mr. Icahn made clear that he wants to assume the license to the brand. "Every building in Atlantic City is in trouble. OK? This isn't unique to Trump," he said. "Everybody wants the brand, including Carl. It's the hottest brand in the country."
While Icahn's estimate is a bit lower:
Mr. Icahn, however, believes his group also would have the right to use the Trump name under an existing licensing deal, but says the success of the casinos don't hinge on that. The main disadvantage to losing the name, he says, would be the $15 million to $20 million cost of changing the casinos' signs.
So we can probably put the value of the Trump brand somewhere in the following range:

-15,000,000 < TRUMP < 3,000,000,000

(the second inequality should be less than or equal to -- not sure how to do it on this text editor)

Neither party here is what you'd call trustworthy and both are clearly pulling the numbers they want out of appropriate places but they are able to make these claims with straight faces partly because of the nature of the problem.

Assigning a value to a brand can be a tricky thing. Let's reduce this to pretty much the simplest possible case and talk about the price differential between your product and a similar house brand. If you make Clorox, we're in pretty good shape. There may be some subtle difference in the quality between your product and, say, the Target store brand but it's probably safe to ignore it and ascribe the extra dollar consumers pay for your product to the effect.

But what about a product like Apple Computers? There's clearly a brand effect at work but in order to measure the price differential we have to decide what products to compare them to. If we simply look at specs the brand effect is huge but Apple users would be quick to argue that they were also paying for high quality, stylish design and friendly interfaces. People certainly pay more for Macs, Ipods, Iphones, and the rest, but how much of that extra money is for features and how much is for brand?

(full disclosure: I use a PC with a dual Vista/Ubuntu operating system. I do my programming [Python, Octave] and analysis [R] in Ubuntu and keep Vista for compatibility issues. I'm very happy with my system. If an Apple user would like equal time we'd be glad to oblige)

I suspect that more products are closer to the Apple end of this spectrum than the Clorox end but even with things like bleach, all we have is a snapshot of a single product. To useful we need to estimate the long term value of the brand. Is it a Zima (assuming Zima was briefly a valuable brand) or is it a Kellogg's Corn Flakes? And we would generally want a brand that could include multiple brands. How do we measure the impact of a brand on products we haven't launched yet? (This last point is particularly relevant for Apple.)

The short answer is you take smart people, give them some precedents and some guidelines then let them make lots of educated guesses and hope they aren't gaming the system to tell you what you want to hear.

It is an extraordinarily easy system to game even with guidelines. In the case of Trump's casinos we have three resorts, each with its own brand that interacts in an unknown and unknowable way with the Trump brand. If you removed Trump's name from these buildings, how would it affect the number of people who visit or the amount they spend?

If we were talking about Holiday Inn or even Harrah's, we could do a pretty good job estimating the effect of changing the name over the door. We would still have to make some assumptions but we would have data to back them up. With Trump, all we would have is assumption-based assumptions. If you take these assumptions about the economy, trends in gambling and luxury spending, the role of Trump's brand and where it's headed, and you give each one of them a small, reasonable, completely defensible nudge in the right direction, it is easy to change your estimates by one or two orders of magnitude.

We also have an unusual, possibly even unique, range of data problem. Many companies have tried to build a brand on a public persona, sometimes quite successfully. Normally a sharp business analyst would be in a good position to estimate the value of one of these brands and answer questions like "if Wayne Gretsky were to remove his name from this winter resort, what impact would it have?"

The trouble with Trump is that almost no one likes him, at least according to his Q score. Most persona-based brands are built upon people who were at some point well-liked and Q score is one of the standard metrics analysts use when looking at those brands. Until we get some start-ups involving John Edwards and Tiger Woods, Mr. Trump may well be outside of the range of our data.

Comparing apples and oranges

Comparing salaries across national borders is a tricky thing to do. I was reminded of this problem while reading a post from Female Science Professor. My experience has been limited to the US and Canada but, even there, it's hard to really contrast these places. When I worked in Montreal, I had easy access to fast public transit, most things in walking distance, inexpensive housing but a much lower salary. In Seattle I have reluctantly concluded that, given my work location, a car was essential.

So how do you compare salaries?

This is actually a general problem in Epidemiology. Socio-economic status is known to be an important predictor of health. But it is tricky to measure. Salary needs to be adjusted for cost of living; hard even when you have good location information (which, in de-identified data you may very well not). Even in large urban areas, costs can be variable depending on location.

Alternatively, there are non-financial rewards (that are status boosting) in many jobs; how do you weight these? Adam Smith noted back in the Wealth of Nations that the a prestigious position was related to lower wages. How do you compare equal salaries between a store clerk and a journalist?

Is a hard problem and I really lack a great solution. But it's worth putting some real thought into!!

Monday, March 1, 2010

"What bankers can learn from arc-welder manufacturers"

Felix Salmon points out the following from a book review from the Wall Street Journal:

Mr. Koller contends that layoffs deprive companies of profit-generating talent and leave the remaining employees distrustful of management—and often eager to find jobs elsewhere ahead of the next layoff round. He cites research showing that, on average, for every employee laid off from a company, five additional ones leave voluntarily within a year. He concludes that the cost of recruiting, hiring and training replacements, in most cases, far outweighs the savings that chief executives assume they're getting when they initiate wholesale firings and plant closings.

Having actually built some of the models that directly or indirectly determined hiring and layoffs, and more importantly having been the one who explained those models to the higher-ups, I very much doubt that most companies spend enough time looking at the hidden and long term costs of layoffs.

The book is Spark, by Frank Koller. Sounds interesting.

Selection Bias with Hazard Ratios

Miguel Hernan has a recetn article on the Hazards of Hazard Ratios. The thing that jumped to my attention was his discussion of "depletion of susceptibles". Any intervention can look protective, eventually, if speeds up disease in the susceptible such that the rate of events in that population eventually drops (as all of the members of the population able to have an event have had it).

I think that this element of hazards ratios illustrates two principles:

1) it always makes sense to begin the analysis of a medication at first use or else you can miss a lot

2) In the long run, we are all dead

So the real trick seems to be more focus on good study design and being careful to formulate problems with precision. Quality study design never goes out of style!

Nate SIlver debunks another polling myth

Here's the old chestnut (from Robert Moran):


In a two way race, political professionals don't even bother to look at the spread between the incumbent and the challenger, they only focus on the incumbent's support relative to 50%. Incumbents tend to get trace elements of the undecideds at the end of a campaign. Sure, there is the occasional exception, but this rule is fairly ironclad in my experience.


Here's Silver's takedown:


There are several noteworthy features of this graph:


1) It is quite common for an incumbent to be polling at under 50 percent in the early polling average; this was true, in fact, of almost half of the races (30 of the 63). An outright majority of incumbents, meanwhile, had at least one early poll in which they were at under 50 percent of the vote.


2) There are lots of races in the top left-hand quadrant of the graph: these are cases in which the incumbent polled at under 50 percent in the early polling average, but wound up with more than 50 percent of the vote in November. In fact, of the 30 races in which the incumbent had less than 50 percent of the vote in the early polls, he wound up with more than 50 percent of the vote 18 times -- a clear majority. In addition, there was one case in which an incumbent polling at under 50 percent wound up with less than 50 percent of the November vote, but won anyway after a small third-party vote was factored in. Overall, 19 of the 30 incumbents to have less than 50 percent of the vote in the early polling average in fact won their election.


3) 5 of the 15 incumbents to have under 45 percent of the vote in early polls also won their elections. These were Bob Menendez (38.9 percent), Tim Palwenty (42.0 percent), Don Carcieri (42.3 percent), Jennifer Granholm (43.4 percent) and Arnold Schwarzenegger (44.3 percent), all in 2006.3b) If we instead look at those cases within three points of Ted Strickland's 44 percent, when the incumbent had between 41 and 47 percent of the vote in early polls, he won on 11 of 17 occasions (65 percent of the time).


4) Almost all of the data points are above the red diagonal line, meaning that the incumbent finished with a larger share of the vote than he had in early polls. This was true on 58 of 63 occasions.


4b) On average, the incumbent added 6.4 percent to his voting total between the early polling average and the election, whereas the challenger added 4.5 percent. Looked at differently, the incumbent actually picked up the majority -- 59 percent -- of the undecided vote vis-a-vis early polls.


4c) The above trend seems quite linear; regardless of the incumbent's initial standing in the early polls, he picked up an average of 6-7 points by the election, although with a significant amount of variance.


5) The following corollary of Moran's hypothesis is almost always true: if an incumbent has 50 percent or more of the vote in early polls, he will win re-election. This was true on 32 of 33 occasions; the lone exception was George Allen in Virginia, who had 51.5 percent of the vote in early polls in 2006 but lost re-election by less than a full point (after running a terrible campaign). It appears that once a voter is willing to express a preference for an incumbent candidate to a pollster, they rarely (although not never) change their minds and vote for the challenger instead.