West Coast Stat Views (on Observational Epidemiology and more): One final(?) reason not to trust teaching metrics

Friday, July 30, 2010

One final(?) reason not to trust teaching metrics

Don't be too afraid (or perhaps relieved) at the thought of OE dropping the subject of education and education reform. I think I can speak for Joseph here when I say that neither of us plan to drop the topic until pundits stop saying stupid things about it.

What follows might be a final reason because I think we've got a pretty good framework for discussing the dangers of making radical changes based on the proposed metrics for measuring teacher performance. Do a keyword search on education to see what I mean.

A few months ago, Joseph and I had a marathon phone conversation where we sketched out the attributes a well designed study of teacher performance would have to have in order to cope with self-selection, interactions, social dynamics, nesting and the myriad other challenges that go with this problem. You will occasionally find an educational study that attempts to deal with one or two of these challenges but, as far as I can tell, no educational study address all of them.

Some of these problems have been widely discussed. Others have been examined in some detail in this blog. But there is at least one that we haven't gotten around to: the question of how certain teachers mesh with certain classes.

Every teacher, no matter how skilled or experienced, will do better with some types of classes than with others. Different classes have different needs. Some need a fast pace; some need patience; some need freedom to explore; some need structure and discipline. After you've been teaching for a while you learn to read classes and adjust your style and presentation but no teacher is equally good under all conditions.

This raises two serious problems for the educational researcher: first, how do you assign classes to teachers and students to classes in a way that protects you from aliasing problems?; second, how do you interpret the results?

Here's a example, let's say teacher A does well with remedial math but does badly with calculus. Teacher B is great with calculus but simply can't communicate with the remedial kids. How do you score those teachers? You could use the mean and conclude that both teachers were doing an average job -- a conclusion that is pretty much wrong in all four cases. You could take the higher of the scores, working under the assumption that administrators are competent managers and therefore know enough to put teachers in classes that match their abilities. Or you might decide that since the purpose is to catch bad teachers we should take the low score. Or you could pick calculus because it's more advanced. Or you could pick remedial because it affects those kids most likely to be left behind.

Of course, all of this presupposes that we have this kind of information available when we try to measure teacher performance (which we never do). It also assumes that we want to have a serious discussion using meaningful data, that we want to be honest and fair, and that we actual care about the quality of our kids' education.

Perhaps it's just the lateness of the hour and the weight of a long week, but I find it increasingly difficult to hold onto that last set of assumptions.

(I'm writing this under the gun and corners are being cut. I will try to go back and add links when the storm passes.)

West Coast Stat Views (on Observational Epidemiology and more)

Friday, July 30, 2010

One final(?) reason not to trust teaching metrics

No comments:

Post a Comment