While the mean score was higher on the days that the teachers chose to submit, once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1.Just to be clear, I don't have any problem with this kind of evaluation and I really like Kane's point about using 360s for teachers, but the claim of perfect correlation has raised a red flag for almost every statistically literate person who saw it. You can see an excellent discussion of this at Gelman's site, both in the original post and in the comments. All the points made there are valid but based on my experience I have one more stick for the fire.
For the sake of argument, let's assume that the extraordinary idea that rank is preserved, that the nth teacher on his or her best day is still worse than the (n+1)th teacher on his or her worst day, is true. For anything more than a trivially small n that would suggest an amazing lack of variability in the quality of lessons from teachers across the spectrum (particularly strange since we would expect weaker and less experienced teachers to be more variable).
But there's a source of noise no one's mentioned and in this case it's actually a good thing.
Except for special cases, teachers walk through the door with a great deal of information about their classes; they've graded tests and homework papers; they've seen the reaction to previous lessons, they've talked with students one-on-one. You would expect (and hope) that these teachers would use that information to adjust their approach on a day to day basis.
The trouble is that if you're evaluating teachers based on an observation (particularly a video observation), you don't have any of that information. You can't say how appropriate a given pace or level of explanation is for that class that day. You can only rely on general guidelines.
Which is not to say that good evaluators can't form a valuable assessment based on a video of a lesson. I'm a big believer in these tools both for staff development and (within reason) evaluation, but it's a inexact and often subjective process. You can get a good picture and diagnose big problems but you will never get the resolution that Kane claimed.
There are other problems with this interview, but the correlation of one should have been an easy catch for the reporter. You should never let an interview subject go unchallenged when claiming perfect results.
No comments:
Post a Comment