Sunday, April 10, 2011

Hamsters, fitness landscapes and an excuse for a repost

All Things Considered had an interesting little story today about the origin of the hamster. I was particularly intrigued by this part:

More troubles followed in the lab. There was more hamster cannibalism, and five others escaped from their cage — never to be found. Finally, two of the remaining three hamsters started to breed, an event hailed as a miracle by their frustrated caretakers.

Those Adam-and-Eve hamsters produced 150 offspring, Dunn says, and they started to travel abroad, sent between labs or via the occasional coat pocket. Today, the hamsters you see in pet stores are most likely descendants of Aharoni's litter.

Because these hamsters are so inbred, they typically have heart disease similar to what humans suffer. Dunn says that makes them ideal research models.

This reminded me of a post from almost a year ago on the subject of lab animals. It also reminded me that I still haven't gotten around to the follow-up I had in mind. Maybe all this reminding will translate into some motivating and I'll actually get the next post on the subject written.
In this post I discussed gradient searches and the two great curses of the gradient searcher, small local optima and long, circuitous paths. I also mentioned that by making small changes to the landscape being searched (in other words, perturbing it) we could sometimes (with luck) improve our search metrics without significantly changing the size and location of our optima.

The idea that you can use a search on one landscape to find the optima of a similar landscape is the assumption behind more than just perturbing. It is also the basis of all animal testing of treatments for humans. This brings genotype into the landscape discussion, but not in the way it's normally used.

In evolutionary terms, we look at an animal's genotype as a set of coordinates for a vast genetic landscape where 'height' (the fitness function) represents that animal's fitness. Every species is found on that landscape, each clustering around its own local maximum.

Genotype figures in our research landscape, but instead of being the landscape itself, it becomes part of the fitness function. Here's an overly simplified example that might clear things up:

Consider a combination of two drugs. If we use the dosage of each drug as an axis, this gives us something that looks a lot like our first example with drug A being north/south, drug B being east/west and the effect we're measuring being height. In other words, our fitness function has a domain of all points on our AB plane and a range corresponding to the effectiveness of that dosage. Since we expect genetics to affect the subjects reaction [corrected a small typo here] to the drugs, genotype has to be part of that fitness function. If we ran the test on lab rats we would expect a different result than if we tested it on humans but we would hope that the landscapes would be similar (or else there would be no point in using lab rats).

Scientists who use animal testing are acutely aware of the problems of going from one landscape to another. For each system studied, they have spent a great deal of time and effort looking for the test species that functions most like humans. The idea is that if you could find an animal with, say, a liver that functions almost exactly like a human liver, you could do most of your controlled studies of liver disease on that animal and only use humans for the final stages.

As sound and appealing as that idea is, there is another way of looking at this.

On a sufficiently high level with some important caveats, all research can be looked at as a set of gradient searches over a vast multidimensional landscape. With each study, researchers pick a point on the landscape, gather data in the region then use their findings [another small edit] and those of other researchers to pick their next point.

In this context, important similarities between landscapes fall into two distinct categories: those involving the positions and magnitudes of the optima; and those involving the search properties of the landscape. Every point on the landscape corresponds to four search values: a max; the number of steps it will take to reach that max; a min; and the number of steps it will take to reach that min. Since we usually want to go in one direction (let's say maximizing), we can generally reduce that to two values for each point, optima of interest and time to converge.

All of this leads us to an interesting and somewhat counterintuitive conclusion. When searching on one landscape to find the corresponding optimum of another, we are vitally interested in seeing a high degree of correlation between the size and location of the optima but given that similarity between optima, similarity in search statistics is at best unimportant and at worst a serious problem.

The whole point of repeated perturbing then searching of a landscape is to produce a wide range of search statistics. Since we're only keeping the best one, the more variability the better. (Best here would generally be the one where the global optimum is associated with the largest region though time to converge can also be important.)

No comments:

Post a Comment