Statisticians often have to come up with a first draft of metrics, filters, winnowing processes, etc. without having a sample of the data they'll be using. One approach to the problem is to take some anecdotal cases and ask ourselves how the system we've proposed would handle them. Would it have trouble classifying, leaving them in some 'other' box, or worse yet, would it mis-classify them, putting something that's clearly bad into the good or even excellent category?
Here's a thought experiment. Many years ago, when teaching at a medium-sized suburban school, I had a classroom across the hall from a football coach who taught history. For the record, some of the best teachers and administrators I have ever dealt with came from coaching. They were gifted motivators who brought to the classroom the same belief in excellence and "giving 110%" that they brought to the field or the court.
This was not one of those coaches.
Not only did he make no effort to motivate his students; I'm not sure he interacted with them in any way. His desk was set up at the back of the room, not a bad arrangement for a study hall but it effectively precluded addressing the class or answering questions or leading a discussion. As far as I could tell, the issue never came up. Students spent their hour filling out worksheets that he had Xeroxed out of a workbook. He spent the hour grading them.
I have never seen a more mind-numbing, soul-crushing approach to education but that didn't stop the principal from holding up this teacher as a role model for the rest of us. His classes were quiet, he never sent a student to the principal's office, and though the student's grasp of the material seldom extended beyond the rote level, that was sufficient for pretty good standardized-test scores (at least for knowledge-based rather than process-based courses).
This was almost two decades ago. Significant chunks of the current reform movement were already in place but No Child Left Behind was still years away. The teacher in question retired the year before I entered graduate school, but assuming he was still around, how well would he do under the proposed teacher evaluation system?
Presumably, most teacher evaluation metrics will largely be based on some combination of three factors: student test scores; classroom management; and supervisor evaluations. Our worksheet-dispensing educator would normally do well on the first and would max out the other two. I said 'normally' because (as mentioned before) these metrics are easy to game and the principal could easily arrange things to bump the test scores for his favorite teacher while screwing over a trouble-making teacher he would like to get rid of (someone like me, for instance).
Even if we assume that the principal didn't play favorites (and that's not an assumption I would have made with this administrator), this teacher would unquestionably be looking at generous bonuses. The question is, is this how we want to define excellence in education?