In response to Joseph's recent post on the over-reliance on mice in medical research (which was prompted by this thought-provoking piece in Slate), I thought I'd dredge up something I wrote on the subject last year:
post I discussed gradient searches and the two great curses of the gradient searcher, small local optima and long, circuitous paths. I also mentioned that by making small changes to the landscape being searched (in other words, perturbing it) we could sometimes (with luck) improve our search metrics without significantly changing the size and location of our optima.
The idea that you can use a search on one landscape to find the optima of a similar landscape is the assumption behind more than just perturbing. It is also the basis of all animal testing of treatments for humans. This brings genotype into the landscape discussion, but not in the way it's normally used.
In evolutionary terms, we look at an animal's genotype as a set of coordinates for a vast genetic landscape where 'height' (the fitness function) represents that animal's fitness. Every species is found on that landscape, each clustering around its own local maximum.
Genotype figures in our research landscape, but instead of being the landscape itself, it becomes part of the fitness function. Here's an overly simplified example that might clear things up:
Consider a combination of two drugs. If we use the dosage of each drug as an axis, this gives us something that looks a lot like our first example with drug A being north/south, drug B being east/west and the effect we're measuring being height. In other words, our fitness function has a domain of all points on our AB plane and a range corresponding to the effectiveness of that dosage. Since we expect genetics to affect the subjects react to the drugs, genotype has to be part of that fitness function. If we ran the test on lab rats we would expect a different result than if we tested it on humans but we would hope that the landscapes would be similar (or else there would be no point in using lab rats).
Scientists who use animal testing are acutely aware of the problems of going from one landscape to another. For each system studied, they have spent a great deal of time and effort looking for the test species that functions most like humans. The idea is that if you could find an animal with, say, a liver that functions almost exactly like a human liver, you could do most of your controlled studies of liver disease on that animal and only use humans for the final stages.
As sound and appealing as that idea is, there is another way of looking at this.
On a sufficiently high level with some important caveats, all research can be looked at as a set of gradient searches over a vast multidimensional landscape. With each study, researchers pick a point on the landscape, gather data in the region then use their findings to pick their findings and those of other researchers to pick their next point.
In this context, important similarities between landscapes fall into two distinct categories: those involving the positions and magnitudes of the optima; and those involving the search properties of the landscape. Every point on the landscape corresponds to four search values: a max; the number of steps it will take to reach that max; a min; and the number of steps it will take to reach that min. Since we usually want to go in one direction (let's say maximizing), we can generally reduce that to two values for each point, optima of interest and time to converge.
All of this leads us to an interesting and somewhat counterintuitive conclusion. When searching on one landscape to find the corresponding optimum of another, we are vitally interested in seeing a high degree of correlation between the size and location of the optima but given that similarity between optima, similarity in search statistics is at best unimportant and at worst a serious problem.
The whole point of repeated perturbing then searching of a landscape is to produce a wide range of search statistics. Since we're only keeping the best one, the more variability the better. (Best here would generally be the one where the global optimum is associated with the largest region though time to converge can also be important.)
In animal testing, changing your population of test subjects perturbs the research landscape. So what? How does thinking of research using different test animals change the way that we might approach research? I'll suggest a few possibilities in my next post on the subject.
More data, less accuracy
1 minute ago