Brian Kalt, a law professor and former college classmate of mine, has developed his own law of presidential facial hair:
I thought you might be interested in the following ironclad law of American presidential politics. I call it Kalt’s Law: “Under the modern two-party system, if a candidate has facial hair, the Republican always has as much, or more, than the Democrat.”
Comments, observations and thoughts from two bloggers on applied statistics, higher education and epidemiology. Joseph is an associate professor. Mark is a professional statistician and former math teacher.
Friday, March 26, 2010
Another reminder that improbable events are probable
Excellent primer on the economics of genre fiction.
I'll try to tie this in with the thriller thread (see here and here) in an upcoming post.
Thursday, March 25, 2010
Advice from Andrew Gelman
I don't know if I entirely buy point 2. I'm generally a frequentist and I make extensive use of transformations (though none of them are linear transformations).They also recommend composite end points (see page 418 of the above-linked article), which is a point that Jennifer and I emphasize in chapter 4 of our book and which comes up all the time, over and over in my applied research and consulting. If I had to come up with one statistical tip that would be most useful to you--that is, good advice that's easy to apply and which you might not already know--it would be to use transformations. Log, square-root, etc.--yes, all that, but more! I'm talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something continuous (those "total scores" that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don't care if the threshold is "clinically relevant" or whatever--just don't do it. If you gotta discretize, for Christ's sake break the variable into 3 categories.
This all seems quite obvious but people don't know about it. What gives? I have a theory, which goes like this. People are trained to run regressions "out of the box," not touching their data at all. Why? For two reasons:1. Touching your data before analysis seems like cheating. If you do your analysis blind (perhaps not even hanging your variable names or converting them from ALL CAPS), then you can't cheat.
2. In classical (non-Bayesian) statistics, linear transformations on the predictors have no effect on inferences for linear regression or generalized linear models. When you're learning applied statistics from a classical perspective, transformations tend to get downplayed, and they are considered as little more than tricks to approximate a normal error term (and the error term, as we discuss in our book, is generally the least important part of a model).Once you take a Bayesian approach, however, and think of your coefficients as not being mathematical abstractions but actually having some meaning, you move naturally into model building and transformations.
Wednesday, March 24, 2010
Fighting words from Andrew Gelman
[I've got a meeting coming up so this will have to be quick and ugly and leave lots of plot threads dangling for the sequel]
From Andrew's reaction to Triumph of the Thriller by Patrick Anderson:
Anderson doesn't really offer any systematic thoughts on all this, beyond suggesting that a higher quality of talent goes into thriller writing than before. He writes that, 50 or 70 years ago, if you were an ambitious young writer, you might want to write like Hemingway or Fitzgerald or Salinger (if you sought literary greatness with the possibility of bestsellerdom too) or like James Michener, or Herman Wouk (if you sought fame and fortune with the possibility of some depth as well) or like Harold Robbins or Irving Wallace (if you wanted to make a business out of your writing). But the topselling authors of mysteries were really another world entirely--even though their books were ubiquitous in drugstore and bus-station bookracks, and even occasionally made their way onto the bestseller lists, they barely overlapped with serious fiction, or with bestselling commercial fiction.
Nowadays, though, a young writer seeking fame and fortune (or, at least, a level of financial security allowing him to write and publish what he wants) might be drawn to the thriller, Anderson argues, for its literary as well as commercial potential. At the very least, why aim to be a modern-day Robbins or Michener if instead you can follow the footsteps of Scott Turow. And not just as a crime novelist, but as a writer of series: "Today, a young novelist with my [Anderson's] journalistic knack for action and dialogue would be drawn to a crime series; if not, his publisher would push him in that direction."
1. I'd argue (and I think most literary historians would back me up) that in terms of literary quality, crime fiction was at its best from about the time Hammet started writing for Black Mask to either the Fifties or Sixties, a period that featured: Chandler; Ross and John D. MacDonald; Jim Thompson; Ed McBain; Donald Westlake; Joe Gores; Lawrence Block* and a slew of worthies currently being reprinted by Hard Case.
2. Crime writing was fairly respected at the time. Check out contemporary reviews (particularly by Dorothy Parker). It was even possible for Marquand to win a Pulitzer for a "serious" novel while writing the Mr. Moto books.
3. There is an economic explanation for both the drop in quality and the surge in sales, but that will have to wait. I have a meeting at one of the studios and I need to go buy a pair of sunglasses.
*Those last three did their best work more recently but they were a product of the pulps.
p.s. Here's an illustrative passage from the NYT on the literary respect a mystery writer might achieve back before thrillers were the dominant genre:
Ross Macdonald's appeal and importance extended beyond the mystery field. He was seen as an important California author, a novelist who evoked his region as tellingly as such mainstream writers as Nathanael West and Joan Didion. Before he died, Macdonald was given the Los Angeles Times's Robert Kirsch Award for a distinguished body of work about the West. Some critics ranked him among the best American novelists of his generation.
By any standard he was remarkable. His first books, patterned on Hammett and Chandler, were at once vivid chronicles of a postwar California and elaborate retellings of Greek and other classic myths. Gradually he swapped the hard-boiled trappings for more subjective themes: personal identity, the family secret, the family scapegoat, the childhood trauma; how men and women need and battle each other, how the buried past rises like a skeleton to confront the present. He brought the tragic drama of Freud and the psychology of Sophocles to detective stories, and his prose flashed with poetic imagery. By the time of his commercial breakthrough, some of Macdonald's concerns (the breakdown between generations, the fragility of moral and global ecologies) held special resonance for a country divided by an unpopular war and alarmed for the environment. His vision was strong enough to spill into real life, where a news story or a friend's revelation could prompt the comment "Just like a Ross Macdonald novel."
It was a vision with meaning for all sorts of readers. Macdonald got fan mail from soldiers, professors, teenagers, movie directors, ministers, housewives, poets. He was claimed as a colleague by good writers around the world, including Eudora Welty, Andrey Voznesensky, Elizabeth Bowen, Thomas Berger, Marshall McLuhan, Margaret Laurence, Osvaldo Soriano, Hugh Kenner, Nelson Algren, Donald Davie, and Reynolds Price.
Assumptions
Marston L, Carpenter JR, Walters KR, Morris RW, Nazareth I, Petersen I. Issues in multiple imputation of missing data for large general practice clinical databases. Pharmacoepidemiol Drug Saf 2010 (currently an epub)
They nicely make the case that blood pressure data is likely to be missing at random in these databases. Given my thoughts that BP data is underused, this is actually a pretty major advance as it allows more confidence in inferences from these large clinical databases.
Good show, folks!
Tuesday, March 23, 2010
More questions about the statistics of Freakonomics
There’s a nice empirical post-script to the debate over the economic effects of classifying the Spotted Owl as an endangered species. Freakonomics cites a study putting the effect at $46 billion, but others, including John Berry, who wrote a story on the subject for the Washington Post, think it’s much closer to zero.
And now it seems the Berry side of the argument has some good Freakonomics-style panel OLS regression analysis of the microeconomy of the Pacific Northwest to back up its side of the argument. A new paper by Annabel Kirschner finds that unemployment in the region didn’t go up when the timber industry improved, and it didn’t go down when the timber industry declined — not after you adjust for much more obvious things like the presence of minorities in the area.
Comparing Apples and furniture in a box
In a previous post on branding, I used Apple as an example of a company that, because of its brand, can charge a substantial premium for its high-quality products. In this New Yorker post, James Surowiecki compares Apple to companies that take the opposite approach.
For Apple, which has enjoyed enormous success in recent years, “build it and they will pay” is business as usual. But it’s not a universal business truth. On the contrary, companies like Ikea, H. & M., and the makers of the Flip video camera are flourishing not by selling products or services that are “far better” than anyone else’s but by selling things that aren’t bad and cost a lot less. These products are much better than the cheap stuff you used to buy at Woolworth, and they tend to be appealingly styled, but, unlike Apple, the companies aren’t trying to build the best mousetrap out there. Instead, they’re engaged in what Wired recently christened the “good-enough revolution.” For them, the key to success isn’t excellence. It’s well-priced adequacy.
These two strategies may look completely different, but they have one crucial thing in common: they don’t target the amorphous blob of consumers who make up the middle of the market. Paradoxically, ignoring these people has turned out to be a great way of getting lots of customers, because, in many businesses, high- and low-end producers are taking more and more of the market. In fashion, both H. & M. and Hermès have prospered during the recession. In the auto industry, luxury-car sales, though initially hurt by the downturn, are reemerging as one of the most profitable segments of the market, even as small cars like the Ford Focus are luring consumers into showrooms. And, in the computer business, the Taiwanese company Acer has become a dominant player by making cheap, reasonably good laptops—the reverse of Apple’s premium-price approach.
Monday, March 22, 2010
True models?
But the best part of the post (besides showing the diversity out there) was hidden at the bottom. Andrew comments:
"In all the settings I've ever worked on, the probability that the model is true is . . . zero!"
Well, he is most certainly correct in pharamcoepidemiology as well. I see a lot of variation over how to handle the biases that are inherent in observational pharmacoepidemiology -- but the focus on randomized drug trials should be a major clue that these associations are tricky to model. As a point of fact, the issue of confounding by indication, channeling bias, indication bias or whatever else you want to call it is central to the field. And the underlying idea here is that we can't get enough information about participants to model the influence of drugs being channeled to sicker patients.
So I wish that, in my field as well, people would realize that the relationships are tricky and no model is ever going to be absolutely correctly specified.
The curse of large numbers and the real problem with p-values
The real problem with p-values isn't just that people want it to do something that it can't do; they want it to do something that no single number can ever do, fully describe the quality and reliability of an experiment or study. This simply isn't one of those mathematical beasts that can be reduced to a scalar. If you try then sooner or later you will inevitably run into a situation where you get the same metric for two tests of widely different quality.
Which leads me to the curse of large numbers. Those you who are familiar with statistics (i.e. pretty much everybody who reads this blog) might want to skip the next paragraph because this goes all the way back to stat 101.
Let's take simplest case we can. You want to show that the mean of some group is positive so you take a random sample and calculate the probability of getting the results you saw or something more extreme (the probability of getting exactly results you saw is pretty much zero) working under the assumption that the mean of the group was actually zero. This works because the bigger the samples you take the more the means of those samples will tend to follow a nice smooth bell curve and the closer those means will tend to group around the mean of the group you're sampling from.
(For any teachers out there, a good way of introducing the central limit theorem is to have students simulate coin flips with Excel then make histograms based on various sample sizes.)
You might think of sampling error as the average difference between the mean of the group you're interested in and the mean of the samples you take from it (that's not exactly what it means but it's close) . The bigger the sample the smaller you expect that error to be which makes sense. If you picked three people at random, you might get three tall people or three millionaires, but if you pick twenty people at random, the chances of getting twenty tall people or twenty millionaires is virtually are next to nothing.
The trouble is that sampling error is only one of the things a statistician has to worry about. The sampled population might not reflect the population you want to draw inferences about. Your sample might not be random. Data may not be accurately entered. There may be problems with aliasing and confounding. Independence assumptions may be violated. With respect to sample size, the biases associated with these problems are all fixed quantities. A big sample does absolutely nothing to address them.
There's an old joke about a statistician who wakes up to find his room on fire, says to himself "I need more observations" and goes back to sleep. We do spend a lot of our time pushing for more data (and, some would say, whining about not having enough), but we do that not because small sample sizes are the root of all of our problems but because they are the easiest problem to fix.
Of course "fix" as used here is an asymptotic concept and the asymptote is not zero. Even an infinite sample wouldn't result in a perfect study; you would still be left with all of the flaws and biases that are an inevitable part of all research no matter how well thought out and executed it may be.
This is a particular concern for the corporate statistician who often encounters the combination of large samples and low quality data. It's not unusual to see analyses done on tens or even hundreds of thousands of sales or customer records and more often than not, when the results are presented someone will point to the nano-scale p-value as an indication of the quality and reliability of the findings.
As far as I know, no one reviewing for a serious journal would think that p<0.001 means that we're 99.9% sure that a conclusion is true, but that's what almost everyone without an analytic background thinks.
And that is a problem.
Sunday, March 21, 2010
Silence
Apologies in advance!
Interesting variable taxation idea from Thoma
Political battles make it very difficult to use discretionary fiscal policy to fight a recession, so more automatic stabilizers are needed. Along those lines, if something like this were to be implemented to stabilize the economy over the business cycle, I'd prefer to do this more generally, i.e. allow income taxes, payroll taxes, etc. to vary procyclically. That is, these taxes would be lower in bad times and higher when things improve, and implemented through an automatic moving average type of rule that produces the same revenue as some target constant tax rate (e.g. existing rates).
Saturday, March 20, 2010
Friday, March 19, 2010
Too late for an actual post, but...
Thursday, March 18, 2010
Some more thoughts on p-value
Obviously, we can't let everyone set their own rules, but (to coin a phrase) I wonder if in an effort to make things as simple as possible, we haven't actually made them simpler. Statistical significance is an arbitrary, context-sensitive cut-off that we assign before a test based on the relative costs of a false positive and a false negative. It is not a God-given value of 5%.
Letting everyone pick their own definition of significance is a bad idea but so is completely ignoring context. Does it make any sense to demand the same level of p-value from a study of a rare, slow-growing cancer (where five-years is quick and a sample size of 20 is an achievement) and a drug to reduce BP in the moderately obese (where a course of treatment lasts two week and the streets are filled with potential test subjects)? Should we ignore a promising preliminary study because it comes in at 0.06?
For a real-life example, consider the public reaction to the recent statement that we didn't have statistically significant data that the earth had warmed over the past 15 years. This was a small sample and I'm under the impression that the results would have been significant at the 0.1 level, but these points were lost (or discarded) in most of the coverage.
We need to do a better job dealing with these grays. We might try replacing the phrase "statistically significant" with "statistically significant at 10/5/1/0.1%." Or we might look at some sort of a two-tiered system, raising significance to 0.01 for most studies while making room for "provisionally significant" papers where research is badly needed, adequate samples are not available, or the costs of a type-II error are deemed unusually high.
I'm not sure how practical or effective these steps might be but I am sure we can do better. Statisticians know how to deal with gray areas; now we need to work on how we explain them.
For more on the subject, check out Joseph's posts here and here.