West Coast Stat Views (on Observational Epidemiology and more)

Sunday, April 10, 2011

Hamsters, fitness landscapes and an excuse for a repost

All Things Considered had an interesting little story today about the origin of the hamster. I was particularly intrigued by this part:

More troubles followed in the lab. There was more hamster cannibalism, and five others escaped from their cage — never to be found. Finally, two of the remaining three hamsters started to breed, an event hailed as a miracle by their frustrated caretakers.

Those Adam-and-Eve hamsters produced 150 offspring, Dunn says, and they started to travel abroad, sent between labs or via the occasional coat pocket. Today, the hamsters you see in pet stores are most likely descendants of Aharoni's litter.

Because these hamsters are so inbred, they typically have heart disease similar to what humans suffer. Dunn says that makes them ideal research models.

This reminded me of a post from almost a year ago on the subject of lab animals. It also reminded me that I still haven't gotten around to the follow-up I had in mind. Maybe all this reminding will translate into some motivating and I'll actually get the next post on the subject written.

In this post I discussed gradient searches and the two great curses of the gradient searcher, small local optima and long, circuitous paths. I also mentioned that by making small changes to the landscape being searched (in other words, perturbing it) we could sometimes (with luck) improve our search metrics without significantly changing the size and location of our optima.

The idea that you can use a search on one landscape to find the optima of a similar landscape is the assumption behind more than just perturbing. It is also the basis of all animal testing of treatments for humans. This brings genotype into the landscape discussion, but not in the way it's normally used.

In evolutionary terms, we look at an animal's genotype as a set of coordinates for a vast genetic landscape where 'height' (the fitness function) represents that animal's fitness. Every species is found on that landscape, each clustering around its own local maximum.

Genotype figures in our research landscape, but instead of being the landscape itself, it becomes part of the fitness function. Here's an overly simplified example that might clear things up:

Consider a combination of two drugs. If we use the dosage of each drug as an axis, this gives us something that looks a lot like our first example with drug A being north/south, drug B being east/west and the effect we're measuring being height. In other words, our fitness function has a domain of all points on our AB plane and a range corresponding to the effectiveness of that dosage. Since we expect genetics to affect the subjects reaction [corrected a small typo here] to the drugs, genotype has to be part of that fitness function. If we ran the test on lab rats we would expect a different result than if we tested it on humans but we would hope that the landscapes would be similar (or else there would be no point in using lab rats).

Scientists who use animal testing are acutely aware of the problems of going from one landscape to another. For each system studied, they have spent a great deal of time and effort looking for the test species that functions most like humans. The idea is that if you could find an animal with, say, a liver that functions almost exactly like a human liver, you could do most of your controlled studies of liver disease on that animal and only use humans for the final stages.

As sound and appealing as that idea is, there is another way of looking at this.

On a sufficiently high level with some important caveats, all research can be looked at as a set of gradient searches over a vast multidimensional landscape. With each study, researchers pick a point on the landscape, gather data in the region then use their findings [another small edit] and those of other researchers to pick their next point.

In this context, important similarities between landscapes fall into two distinct categories: those involving the positions and magnitudes of the optima; and those involving the search properties of the landscape. Every point on the landscape corresponds to four search values: a max; the number of steps it will take to reach that max; a min; and the number of steps it will take to reach that min. Since we usually want to go in one direction (let's say maximizing), we can generally reduce that to two values for each point, optima of interest and time to converge.

All of this leads us to an interesting and somewhat counterintuitive conclusion. When searching on one landscape to find the corresponding optimum of another, we are vitally interested in seeing a high degree of correlation between the size and location of the optima but given that similarity between optima, similarity in search statistics is at best unimportant and at worst a serious problem.

The whole point of repeated perturbing then searching of a landscape is to produce a wide range of search statistics. Since we're only keeping the best one, the more variability the better. (Best here would generally be the one where the global optimum is associated with the largest region though time to converge can also be important.)

The Anti-Hulu Hypothesis

In response to Joseph's comment about the availability of the Big Bang Theory online, I thought I'd mention that, for the moment, CBS is keeping the two most recent episodes up on its site.

CBS has long had the most complicated but (I suspect) best thought-out approach to online viewing, deciding whether or not to provide shows on a case-by-case basis. Of course, 'best thought-out' here is in respect to stockholders, not viewers. The interests of those two groups generally align when you're talking about quality of programming but have a way of diverging when you're talking about revenue streams and cannibalization.

Saturday, April 9, 2011

[OT] Bioware's new Star Wars RPG

Based on a perceptive review by Trollsymth, It seems like the new Star Wars multi-player online game captures the music and atmosphere of Star Wars well. But it focuses on combat as the means to gaining experience. I think that this is a very unfortunate decision as Star Wars has a strong history of using guile in the place of brute force.

I think that this would make for a more interesting game and could have easily been accomplished with goal based experience points. Focusing on killing makes sense for the Sith but Jedi and Rebels would benefit from minimizing casualties.

Health Care and Confusion

One of the most important underlying issues preventing efficient markets for health care is explained in a Dilbert cartoon.

Friday, April 8, 2011

Some perspective

A nice quote from the comments on Worthwhile Canadian Initiative:

I'm sure we can all agree that in any organization the size of the Canadian federal gov't ($175 billion per annum in revenue!), there must be inefficiency somewhere. But the issue at hand is not "is there inefficiency?" but rather "is there $10-12 billion of annual waste?".

I think that it really puts questions into perspective.

And if you don't follow WCI, it really does have some of the best commentary around (right up there with Marginal Revolution at its best)

It's not often epidemiologists get pandered to

A clip for everyone in our target audience except Andrew Gelman.

"Cathie Black's Departure: Totally Predictable. Plus: Meet Dennis Walcott"

Dana Goldstein has a good, pithy post on the shake-up in NYC.

Thursday, April 7, 2011

Another interesting Ryan graph from Krugman

I was listening to Talk of the Nation this evening (admittedly not NPR at its best or even second best) and I was struck by how different (and inferior) what I was hearing was to the wonk debate (Krugman, Thoma, Gelman, et al.). Journalists and pundits are arguing over whether Ryan is bold or extra bold while we should be arguing over the practicality and advisability of a recovery based on another housing boom.

Wednesday, April 6, 2011

Andrew Gelman buries the lede

As Joseph mentioned earlier, Andrew Gelman has a must-read post up at the Monkey Cage. The whole thing is worth checking out but for me the essential point came at the end:

Internal (probabilistic) vs. external (statistical) forecasts

In statistics we talk about two methods of forecasting. An internal forecast is based on a logical model that starts with assumptions and progresses forward to conclusions. To put it in the language of applied statistics: you take x, and you take assumptions about theta, and you take a model g(x,theta) and use it to forecast y. You don't need any data y at all to make this forecast! You might use past y's to fit the model and estimate the thetas and test g, but you don't have to.

In contrast, an external forecast uses past values of x and y to forecast future y. Pure statistics, no substantive knowledge. That's too bad, put the plus side is that it's grounded in data.

A famous example is the space shuttle crash in 1986. Internal models predicted a very low probability of failure (of course! otherwise they wouldn't have sent that teacher along on the mission). Simple external models said that in about 100 previous launches, 2 had failed, yielding a simple estimate of 2%.

We have argued, in the context of election forecasting, that the best approach is to combine internal and external approaches.

Based on the plausibility analysis above, the Beach et al. forecast seems to me to be purely internal. It's great that they're using real economic knowledge, but as a statistician I can see what happens whey your forecast is not grounded in the data. Short-term, I suggest they calibrate their forecasts by applying them to old data to forecast the past (this is the usual approach). Long-term, I suggest they study the problems with their forecasts and use these flaws to improve their model.

When a model makes bad predictions, that's an opportunity to do better.

All too often, we treat models like the ancient Greeks might have treated the Oracle of Delphi, an ultimate and unknowable authority. If we're going to use models in our debates, we also need to talk about where they come from, what assumptions go into them, how range-of-data concerns might affect them.

Unemployment Forecasting

There was a nice discussion of the plausibility of the employment figures in the new Paul Ryan 2012 budget proposal for the United States by both Andrew Gelman and Paul Krugman. While this blog isn't really a political one, I do think that this current discussion is a good example of how to critically evaluate models in epidemiology. It is pretty rare that a model will be simply and obviously wrong. Instead, you have to look at the all of the different elements of the model and see what looks dodgy. After all, the actual headline result is almost always something for which we are uncertain about the actual answer. So we have to look for clues as what might be going wrong by looking at the other outputs of the model (and perhaps some of the modeling assumptions).

If a study shows that the use of statin class drugs prevents cancer that is a pretty interesting finding. But the finding gets less interesting if further exploration reveals that statins prevent all forms of disease except for cardiovascular disease. The latter would be a clue that something, somewhere, is going wrong.

In the case of the Paul Ryan budget, it seems like this estimate of unemployment is lower than it should plausibly be which might obfuscate the idea trade-off between taxes and economic growth. I am not an economist (in any way, shape or form) but am willing to conjecture that his dynamic scoring algorithm for the influence of tax cuts on unemployment might be an issue. Perhaps the algorithm should account for diminishing returns as unemployment falls (but fails to do so properly). Or maybe the model overstates the magnitude of the underlying relation (or, possibly, it might reverse it). Complex models have a lot moving parts and there are a lot of places that bias can be introduced into them. So it’s important to be critical (of both our own work and the work of others) when we try and do this type of difficult forecasting.

The way some people talked in 1930

"The boy spoke two words, the first a short guttural verb, the second 'you.'"

A few more quick thoughts before the Maltese Falcon goes back to the library. I've always been impressed by how much John Huston was able to get past the Hays office but the novel is far more frank. I'm no authority on the literature of the era but it seems ways ahead of its time.

Tuesday, April 5, 2011

Why I still drop by Krugman's blog once or twice a week

Here, in handy graph form, is what the good people at Heritage claim will happen if we adopt Ryan's budget.

These are, of course, the same people who predicted the Clinton tax hikes would trigger a devastating recession.

It's that sub-advisement you really have to worry about

Matt Yglesias is having trouble understanding John Hancock's explanation of its fee structure. I can't imagine why (via Felix Salmon):

“For internally-managed Funds advised and sub-advised exclusively by John Hancock’s affiliates, the total fees John Hancock and its affiliates receive from these Funds may be higher than those advised or sub-advised exclusively by unaffiliated mutual fund companies. These fees can come from the Fund or trust’s Rule 12b-1, sub-transfer agency, management, AMC or other fees, and may vary from Fund to Fund.”

Brad DeLong digs through the NYT archives for this memorable rebuttal of Charles Murray

From Bob Herbert:

The book shows that, on average, blacks score about 15 points lower than whites on intelligence tests, a point that was widely known and has not been in dispute. Mr. Murray and I (and many, many others) differ on the reasons for the disparity. I would argue that a group that was enslaved until little more than a century ago; that has long been subjected to the most brutal, often murderous, oppression; that has been deprived of competent, sympathetic political representation; that has most often had to live in the hideous physical conditions that are the hallmark of abject poverty; that has tried its best to survive with little or no prenatal care, and with inadequate health care and nutrition; that has been segregated and ghettoized in communities that were then redlined by banks and insurance companies and otherwise shunned by business and industry; that has been systematically frozen out of the job market; that has in large measure been deliberately deprived of a reasonably decent education; that has been forced to cope with the humiliation of being treated always as inferior, even by imbeciles -- I would argue that these are factors that just might contribute to a certain amount of social pathology and to a slippage in intelligence test scores.
Mr. Murray says no. His book strongly suggests that the disparity is inherent, genetic, and there is little to be done about it....

The last time I checked, both the Protestants and the Catholics in Northern Ireland were white. And yet the Catholics, with their legacy of discrimination, grade out about 15 points lower on I.Q. tests...

Fixing performance pay

Derek Neal, Professor of Economics at the University of Chicago makes an interesting argument about the poor performance of performance pay for teachers:

"Many accountability and performance pay systems employ test scores from assessment systems that produce information used not only to determine rewards and punishments for educators, but also to inform the public about progress in student learning," Neal writes in the paper, "The Design of Performance Pay in Education."

These testing systems make it easy, in theory, for policymakers to obtain consistent measures of student and teacher performance over time. But Neal argues that the same testing regimes also make it easy, in practice, for educators to game incentive systems by coaching students for exams rather than teaching them to master subject matter.

"As long as education authorities keep trying to accomplish both of these tasks (measurement and incentive provisions) with one set of assessments, they will continue to fail at both tasks," he adds in the paper, which was published by the National Bureau of Economic Research and is a chapter in the upcoming Handbook of Economics of Education.

...

Separate assessment systems that involve no stakes for teachers, and thus no incentives for manipulation, should be used to produce measures of student performance over time, Neal contends. This two-system approach would discourage excessive "teaching to the test."

"The designers of assessment-based incentive schemes must take seriously the challenge of designing a series of assessments such that the best response of educators is not to coach, but to teach in ways that build true mastering," Neal said.

I'm not sure I'm in full agreement here. For one thing, the problems with our current methods for evaluating student progress are deeply flawed even when not asked to do double duty. Second, in my experience, most of the pressure to inflate scores comes from above. As long as test scores affect the fortunes of administrators, the less ethical superintendents and principals will find a way to influence teachers (even without the option of dismissal, a principal can make a teacher's life very tough).

Just to be clear, almost all of the administrators I've worked have been dedicated and ethical but I can think of at least one guy, two time zones and two decades from here and now, who managed to pressure a number of tenured but spineless teachers into spending weeks doing nothing but prepping for standardized tests.

What we need is a more comprehensive and better thought out system for measuring student progress.