West Coast Stat Views (on Observational Epidemiology and more)

Monday, March 29, 2010

Now I'm going to spend the rest of the day wondering what a giant deodorant gun looks like

From the Guardian via TNR:

Beijing is to install 100 deodorant guns at a stinking landfill site on the edge of the city in a bid to dampen complaints about the capital's rubbish crisis. ...

Thrillers on Economics -- a quick digression

I've been working on a series of posts about the economics of crime novels (see here and here) and it got me thinking about economics in crime novels. I'm no expert but here's my incomplete survey.

George Goodman (a.k.a. "Adam Smith") once bemoaned the absence of business in American literature with the notable exception of John P. Marquand. With all due respect to the estimable Marquand (himself no stranger to the pulps), Goodman might have found what he was looking for if he had spent less time in high-end bookstores and more time in his corner drugstore looking at the books with the lurid covers.

Of the many crime novels built around businesses, the best might be Murder Must Advertise, a Lord Whimsey by Dorothy L. Sayers. The story is set in a London ad agency in the Thirties, a time when the traditional roles of the aristocracy were changing and "public school lads" were showing up in traditional bourgeois fields like advertising.

Sayers had been a highly successful copywriter (variations on some of her campaigns are still running today) and has sometimes been credited with coining the phrase "It pays to advertise." All this success did not soften her view of the industry, a view which is probably best captured by Whimsey's observation that truth in advertising is like yeast in bread.

But even if Sayers holds the record for individual event, the lifetime achievement award has got to go to the man whom many* consider the best American crime novelist, John D. MacDonald.

Before trying his hand at writing, MacDonald had earned an MBA at Harvard and over his forty year writing career, business and economics remained a prominent part of his fictional universe (one supporting character in the Travis McGee series was an economist who lived on a boat called the John Maynard Keynes). But it was in some of the non-series books that MacDonald's background moved to the foreground.

Real estate frequently figured in MacDonald's plots (not that surprising given given their Florida/Redneck Riviera settings). His last book, Barrier Island, was built around a plan to work federal regulations and creative accounting to turn a profit from the cancellation of a wildly overvalued project. In Condominium, sleazy developers dodge environmental regulations and building codes (which turned out to be a particularly bad idea in a hurricane-prone area).

Real estate also figures MacDonald's examination of televangelism, One More Sunday, as does almost every aspect of an Oral Roberts scale enterprise, HR, security, public relations, lobbying, broadcasting and most importantly fund-raising. It's a complete, realistic, insightful picture. You can find companies launched with less detailed business plans.

But MacDonald's best book on business may be A Key to the Suite, a brief and exceedingly bitter account of a management consultant deciding the future of various executives at a sales convention. Suite was published as a Gold Medal Original paperback in 1962. You could find a surprising amount of social commentary in those drugstore book racks, usually packaged with lots of cleavage.

* One example of many:

“To diggers a thousand years from now, the works of John D. MacDonald would be a treasure on the order of the tomb of Tutankhamen.” - KURT VONNEGUT

Sunday, March 28, 2010

All Cretans are ad execs

This ad reminded of the Liar's Paradox. Not exactly the same thing, but the juxtaposition of messages -- romanticized images of cars brainwash you into desiring hollow status symbols/look at the romanticized images of our cars -- certainly plays to the irony-impaired.

Saturday, March 27, 2010

My best subject used to be recess

David Elkind has a good op-ed piece out today on the loss of unstructured playtime in many schools.

One consequence of these changes is the disappearance of what child-development experts call “the culture of childhood.” This culture, which is to be found all over the world, was best documented in its English-language form by the British folklorists Peter and Iona Opie in the 1950s. They cataloged the songs, riddles, jibes and incantations (“step on a crack, break your mother’s back”) that were passed on by oral tradition. Games like marbles, hopscotch and hide and seek date back hundreds of years. The children of each generation adapted these games to their own circumstances.
Yet this culture has disappeared almost overnight, and not just in America. For example, in the 1970s a Japanese photographer, Keiki Haginoya, undertook what was to be a lifelong project to compile a photo documentary of children’s play on the streets of Tokyo. He gave up the project in 1996, noting that the spontaneous play and laughter that once filled the city’s streets, alleys and vacant lots had utterly vanished.
For children in past eras, participating in the culture of childhood was a socializing process. They learned to settle their own quarrels, to make and break their own rules, and to respect the rights of others. They learned that friends could be mean as well as kind, and that life was not always fair.

I have some quibbles with the essay and strong objections to a couple of points but most of what Elkind has to say here is valid and important.

The fundamental assumption of all educational debates needs to be that children are naturally curious and creative, that evolution has programmed them to learn and explore. Strategies that do a good job capitalizing on that curiosity and creativity will be successful and sometimes the best way to do that is to simply get out of the kids' way.

Friday, March 26, 2010

Another reminder that improbable events are probable

From Jonathan Chait:

Brian Kalt, a law professor and former college classmate of mine, has developed his own law of presidential facial hair:

I thought you might be interested in the following ironclad law of American presidential politics. I call it Kalt’s Law: “Under the modern two-party system, if a candidate has facial hair, the Republican always has as much, or more, than the Democrat.”

Excellent primer on the economics of genre fiction.

In the introduction to Science Fiction by Gaslight, Sam Moskowitz does a really good job explaining how changes in publishing led to the creation of most of today's popular fiction genres. It's an interesting book if you can find a copy.

I'll try to tie this in with the thriller thread (see here and here) in an upcoming post.

Thursday, March 25, 2010

Advice from Andrew Gelman

Whom I always defer to on non-literary matters:

They also recommend composite end points (see page 418 of the above-linked article), which is a point that Jennifer and I emphasize in chapter 4 of our book and which comes up all the time, over and over in my applied research and consulting. If I had to come up with one statistical tip that would be most useful to you--that is, good advice that's easy to apply and which you might not already know--it would be to use transformations. Log, square-root, etc.--yes, all that, but more! I'm talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something continuous (those "total scores" that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don't care if the threshold is "clinically relevant" or whatever--just don't do it. If you gotta discretize, for Christ's sake break the variable into 3 categories.

This all seems quite obvious but people don't know about it. What gives? I have a theory, which goes like this. People are trained to run regressions "out of the box," not touching their data at all. Why? For two reasons:
1. Touching your data before analysis seems like cheating. If you do your analysis blind (perhaps not even hanging your variable names or converting them from ALL CAPS), then you can't cheat.
2. In classical (non-Bayesian) statistics, linear transformations on the predictors have no effect on inferences for linear regression or generalized linear models. When you're learning applied statistics from a classical perspective, transformations tend to get downplayed, and they are considered as little more than tricks to approximate a normal error term (and the error term, as we discuss in our book, is generally the least important part of a model).Once you take a Bayesian approach, however, and think of your coefficients as not being mathematical abstractions but actually having some meaning, you move naturally into model building and transformations.

I don't know if I entirely buy point 2. I'm generally a frequentist and I make extensive use of transformations (though none of them are linear transformations).

Wednesday, March 24, 2010

Fighting words from Andrew Gelman

Or at least a fighting summary of someone else's...

[I've got a meeting coming up so this will have to be quick and ugly and leave lots of plot threads dangling for the sequel]

From Andrew's reaction to Triumph of the Thriller by Patrick Anderson:

Anderson doesn't really offer any systematic thoughts on all this, beyond suggesting that a higher quality of talent goes into thriller writing than before. He writes that, 50 or 70 years ago, if you were an ambitious young writer, you might want to write like Hemingway or Fitzgerald or Salinger (if you sought literary greatness with the possibility of bestsellerdom too) or like James Michener, or Herman Wouk (if you sought fame and fortune with the possibility of some depth as well) or like Harold Robbins or Irving Wallace (if you wanted to make a business out of your writing). But the topselling authors of mysteries were really another world entirely--even though their books were ubiquitous in drugstore and bus-station bookracks, and even occasionally made their way onto the bestseller lists, they barely overlapped with serious fiction, or with bestselling commercial fiction.

Nowadays, though, a young writer seeking fame and fortune (or, at least, a level of financial security allowing him to write and publish what he wants) might be drawn to the thriller, Anderson argues, for its literary as well as commercial potential. At the very least, why aim to be a modern-day Robbins or Michener if instead you can follow the footsteps of Scott Turow. And not just as a crime novelist, but as a writer of series: "Today, a young novelist with my [Anderson's] journalistic knack for action and dialogue would be drawn to a crime series; if not, his publisher would push him in that direction."

1. I'd argue (and I think most literary historians would back me up) that in terms of literary quality, crime fiction was at its best from about the time Hammet started writing for Black Mask to either the Fifties or Sixties, a period that featured: Chandler; Ross and John D. MacDonald; Jim Thompson; Ed McBain; Donald Westlake; Joe Gores; Lawrence Block* and a slew of worthies currently being reprinted by Hard Case.

2. Crime writing was fairly respected at the time. Check out contemporary reviews (particularly by Dorothy Parker). It was even possible for Marquand to win a Pulitzer for a "serious" novel while writing the Mr. Moto books.

3. There is an economic explanation for both the drop in quality and the surge in sales, but that will have to wait. I have a meeting at one of the studios and I need to go buy a pair of sunglasses.

*Those last three did their best work more recently but they were a product of the pulps.

p.s. Here's an illustrative passage from the NYT on the literary respect a mystery writer might achieve back before thrillers were the dominant genre:

Ross Macdonald's appeal and importance extended beyond the mystery field. He was seen as an important California author, a novelist who evoked his region as tellingly as such mainstream writers as Nathanael West and Joan Didion. Before he died, Macdonald was given the Los Angeles Times's Robert Kirsch Award for a distinguished body of work about the West. Some critics ranked him among the best American novelists of his generation.
By any standard he was remarkable. His first books, patterned on Hammett and Chandler, were at once vivid chronicles of a postwar California and elaborate retellings of Greek and other classic myths. Gradually he swapped the hard-boiled trappings for more subjective themes: personal identity, the family secret, the family scapegoat, the childhood trauma; how men and women need and battle each other, how the buried past rises like a skeleton to confront the present. He brought the tragic drama of Freud and the psychology of Sophocles to detective stories, and his prose flashed with poetic imagery. By the time of his commercial breakthrough, some of Macdonald's concerns (the breakdown between generations, the fragility of moral and global ecologies) held special resonance for a country divided by an unpopular war and alarmed for the environment. His vision was strong enough to spill into real life, where a news story or a friend's revelation could prompt the comment "Just like a Ross Macdonald novel."
It was a vision with meaning for all sorts of readers. Macdonald got fan mail from soldiers, professors, teenagers, movie directors, ministers, housewives, poets. He was claimed as a colleague by good writers around the world, including Eudora Welty, Andrey Voznesensky, Elizabeth Bowen, Thomas Berger, Marshall McLuhan, Margaret Laurence, Osvaldo Soriano, Hugh Kenner, Nelson Algren, Donald Davie, and Reynolds Price.

Assumptions

We always talk about how hard it is to actually try and verify the assumptions required for missing data techniques to yield unbiased answers. Still, it really is a breath of fresh air when somebody tries to give some (data driven) guidance on whether or not an assumption really is reasonable. That was the case with a recent PDS article:

Marston L, Carpenter JR, Walters KR, Morris RW, Nazareth I, Petersen I. Issues in multiple imputation of missing data for large general practice clinical databases. Pharmacoepidemiol Drug Saf 2010 (currently an epub)

They nicely make the case that blood pressure data is likely to be missing at random in these databases. Given my thoughts that BP data is underused, this is actually a pretty major advance as it allows more confidence in inferences from these large clinical databases.

Good show, folks!

Tuesday, March 23, 2010

Comparing Apples and furniture in a box

In a previous post on branding, I used Apple as an example of a company that, because of its brand, can charge a substantial premium for its high-quality products. In this New Yorker post, James Surowiecki compares Apple to companies that take the opposite approach.

For Apple, which has enjoyed enormous success in recent years, “build it and they will pay” is business as usual. But it’s not a universal business truth. On the contrary, companies like Ikea, H. & M., and the makers of the Flip video camera are flourishing not by selling products or services that are “far better” than anyone else’s but by selling things that aren’t bad and cost a lot less. These products are much better than the cheap stuff you used to buy at Woolworth, and they tend to be appealingly styled, but, unlike Apple, the companies aren’t trying to build the best mousetrap out there. Instead, they’re engaged in what Wired recently christened the “good-enough revolution.” For them, the key to success isn’t excellence. It’s well-priced adequacy.
These two strategies may look completely different, but they have one crucial thing in common: they don’t target the amorphous blob of consumers who make up the middle of the market. Paradoxically, ignoring these people has turned out to be a great way of getting lots of customers, because, in many businesses, high- and low-end producers are taking more and more of the market. In fashion, both H. & M. and Hermès have prospered during the recession. In the auto industry, luxury-car sales, though initially hurt by the downturn, are reemerging as one of the most profitable segments of the market, even as small cars like the Ford Focus are luring consumers into showrooms. And, in the computer business, the Taiwanese company Acer has become a dominant player by making cheap, reasonably good laptops—the reverse of Apple’s premium-price approach.

Monday, March 22, 2010

True models?

The p-value discussion started by an arcile authored by Tom Siegfried has generatesd a lot of discussion. Andrew Gelman has tried to round up many of the discussion points.

But the best part of the post (besides showing the diversity out there) was hidden at the bottom. Andrew comments:

"In all the settings I've ever worked on, the probability that the model is true is . . . zero!"

Well, he is most certainly correct in pharamcoepidemiology as well. I see a lot of variation over how to handle the biases that are inherent in observational pharmacoepidemiology -- but the focus on randomized drug trials should be a major clue that these associations are tricky to model. As a point of fact, the issue of confounding by indication, channeling bias, indication bias or whatever else you want to call it is central to the field. And the underlying idea here is that we can't get enough information about participants to model the influence of drugs being channeled to sicker patients.

So I wish that, in my field as well, people would realize that the relationships are tricky and no model is ever going to be absolutely correctly specified.

The curse of large numbers and the real problem with p-values

(Some final thoughts on statistical significance)

The real problem with p-values isn't just that people want it to do something that it can't do; they want it to do something that no single number can ever do, fully describe the quality and reliability of an experiment or study. This simply isn't one of those mathematical beasts that can be reduced to a scalar. If you try then sooner or later you will inevitably run into a situation where you get the same metric for two tests of widely different quality.

Which leads me to the curse of large numbers. Those you who are familiar with statistics (i.e. pretty much everybody who reads this blog) might want to skip the next paragraph because this goes all the way back to stat 101.

Let's take simplest case we can. You want to show that the mean of some group is positive so you take a random sample and calculate the probability of getting the results you saw or something more extreme (the probability of getting exactly results you saw is pretty much zero) working under the assumption that the mean of the group was actually zero. This works because the bigger the samples you take the more the means of those samples will tend to follow a nice smooth bell curve and the closer those means will tend to group around the mean of the group you're sampling from.

(For any teachers out there, a good way of introducing the central limit theorem is to have students simulate coin flips with Excel then make histograms based on various sample sizes.)

You might think of sampling error as the average difference between the mean of the group you're interested in and the mean of the samples you take from it (that's not exactly what it means but it's close) . The bigger the sample the smaller you expect that error to be which makes sense. If you picked three people at random, you might get three tall people or three millionaires, but if you pick twenty people at random, the chances of getting twenty tall people or twenty millionaires is virtually are next to nothing.

The trouble is that sampling error is only one of the things a statistician has to worry about. The sampled population might not reflect the population you want to draw inferences about. Your sample might not be random. Data may not be accurately entered. There may be problems with aliasing and confounding. Independence assumptions may be violated. With respect to sample size, the biases associated with these problems are all fixed quantities. A big sample does absolutely nothing to address them.

There's an old joke about a statistician who wakes up to find his room on fire, says to himself "I need more observations" and goes back to sleep. We do spend a lot of our time pushing for more data (and, some would say, whining about not having enough), but we do that not because small sample sizes are the root of all of our problems but because they are the easiest problem to fix.

Of course "fix" as used here is an asymptotic concept and the asymptote is not zero. Even an infinite sample wouldn't result in a perfect study; you would still be left with all of the flaws and biases that are an inevitable part of all research no matter how well thought out and executed it may be.

This is a particular concern for the corporate statistician who often encounters the combination of large samples and low quality data. It's not unusual to see analyses done on tens or even hundreds of thousands of sales or customer records and more often than not, when the results are presented someone will point to the nano-scale p-value as an indication of the quality and reliability of the findings.

As far as I know, no one reviewing for a serious journal would think that p<0.001 means that we're 99.9% sure that a conclusion is true, but that's what almost everyone without an analytic background thinks.

And that is a problem.

Another chapter in the New Republic Debate

Check it out here.

Sunday, March 21, 2010

Silence

I'm in the process of moving from one corner of the United States to the other. Blogging may be extremely light for the next 2 weeks.

Apologies in advance!