First, there was a comment by Stuart Buck:
As a matter of basic social science, what should concern one is not the absolute level of a state's performance now but the counterfactual (what would its performance otherwise be).This is absolutely correct. However, what we are really missing is a time frame for improvement as well as an expected magnitude of improvement. So if we look at 1999, the top rated state in StudentsFirst (La) had a Grade 8 reading score of 252 (compared to an average of 261). In 2011 the score was 255 (improved 3 points) versus a national average of 264 (which also improved three points). Between 2009 and 2011 both La and the national average also improved by the same amount. DC is even more interesting. In 2007 (when Rhee began her reforms) Grade 8 reading was 241 (versus 261 nationwide). In 2011 it was 242 (versus 264 nationwide).
So we would have to explain the unexpected drop in performance we expected in Louisiana that simply did not happen or the reason by DC lagged even further 4 years into a reform program. Even longitudinally, first in the nation seems odd given a lower baseline (thus more room for improvement due to lower hanging fruit and maybe even regression to the mean) and an absolutely dead standard increase.
Second, at lawyers, guns and money a commenter said:
That being said, using it for evaluative purposes is misguided and unfair to educators. I proctor the test, and I see a large number of students who don’t take the test seriously at all. They just click through to get it over with. Our student population has taken the test in the grips of a horrible flu outbreak. Those kids who were actually in school at the time were sick, getting sick, or struggling to get over being sick. When you have to spray down the computers with Lysol after every class comes through, you really have to question the validity of the results obtained. Technical difficulties that require restarting the computer and/or test can also have a suppressive effect on students’ scores.As the Tech coordinator in a school, this seems to be a reasonable position to make such an evaluation. That raises the question of "high stakes for whom"? I am actually a fan of looking at SAT scores. Why? Because not only is the test well respected but the test makers have a financial incentive to make sure the test does what they say it does (so it can continue as a national standard). The students have an incentive to do well on this test because high scores open doors for them. So when a teacher is evaluated on SAT performance, I am pretty comfortable saying that the other actors are likely to have aligned incentives on giving an unbiased estimate.
Finally, the thing that really seems to be mixed up in the Rhee report is the difference between efficiency (cost savings) and quality (performance). By analogy, consider military pensions. They exist, in large part, so that we can retain top performers in the armed forces. If anything the defined benefit pensions improves quality by keeping soldier with 15 years of valuable experience in the military. The problem with pensions only arises if the military gets bad at weeding out incompetent performers (which, so far as I can tell, is not currently a major problem). It is good to keep experienced people around while they are still effective but it is expensive. So the empirical question is does it cost more than it is worth?
The same issue arises with the class size metric. I have been in large and small classes with an excellent teacher. I learned a lot more in the small class because the teacher could focus more attention on each student. Is it better to have large classes (like StudentsFirst claims)? Well, only if you have identified top performers and can assure yourself that you are compensating for class quality with teacher quality. This is a hard claim to support. On the other hand, almost no luxury is as expensive as small classes. Notice how universities have reacted to this pressure by putting hundreds of students into a single classroom. So is the cost worth the improvement in quality is a legitimate question.
So the issues here are twofold. One, the data on performance do not seem to map easily onto the counter-intuitive rankings of StudentsFirst. Two, the type of high stakes test that seems to be a key feature of the education reform movement has some work to do in properly aligning incentives.
No comments:
Post a Comment