West Coast Stat Views (on Observational Epidemiology and more): Jim Manzi

Showing posts with label Jim Manzi. Show all posts

Saturday, July 2, 2011

Jonathan Chait -- now to the right of David Brooks on Education

For those keeping track, Jonathan Chait has now chastised both David Brooks and the National Review's Jim Manzi for being too moderate on this subject.

Thursday, February 10, 2011

In the center, the National Review. On the right, the New Republic

Jim Manzi has an excellent column discussing proposed teacher evaluation metrics from a business perspective, a column that raises some of the same questions that teacher's unions have brought up. There's nothing particularly surprising about that -- Manzi is an intelligent man with a well known independent streak. He's not going to disagree with a position just because he's a conservative.

Jonathan Chait dismisses Manzi's points with some sweeping generalities, completely ignores his point about fairness to the evaluated and ends up being significantly less sympathetic to the concerns of labor than Manzi. Sadly this not surprising either. Chait is one of the most brilliant pundits we have but on the topic of education he combines intense feelings with an apparent lack of knowledge of the important research in the field. This has caused him to embrace certain popular narratives even when they lead him to conclusions that contradict his long standing values.

But as unsurprising as the parts may be, when you put them together the strangeness of the current education debate just sweeps over you. Formerly right-wing positions like privatizing large numbers of schools or denying unions the right to protect workers from unfair termination are now dogma for much of the left. It has reached the point where when a writer for the National Review suggests, as part a larger analysis, that teachers can have legitimate concerns about the reliability of the metrics used to evaluate them, the voice of the New Republic dismisses the possibility without even feeling the need to make an argument.

Even without the political role reversal, Chait's response is strange and oddly disengaged. Judge for yourself.

[I'm presenting these out of order for reasons that will obvious]

1. You need some system for deciding how to compensate teachers. Merit pay may not be perfect, but tenure plus single-track longevity-based pay is really, really imperfect. Manzi doesn't say that better systems for measuring teachers are futile, but he's a little too fatalistic about their potential to improve upon a very badly designed status quo.

Argument by modifier with not one but two 'really's and a 'very' to sell the point. What he doesn't give is any kind of supporting evidence whatsoever. With millions of teachers and a small but thriving industry of think tanks digging up damning anecdotes, you can always find something negative to say, but Chait doesn't even bother coming up a bad argument.

There's an odd, listless quality to the entire post. Chait is normally an energetic and relentless debater. Here he just goes through the motions. He doesn't even bother to proof his prose (I'm pretty sure he either meant to say "the search...is futile"). He also makes a huge jump from the specific techniques Manzi is focusing on to "better systems." I'm pretty sure that Manzi believes better systems can improve the status quo; he just questions how big a role value-add metrics will play in those systems.

As for the case for longevity vs. value-added, I'll let Donald Rubin take it from here:

We do not think that their analyses are estimating causal quantities, except under extreme and unrealistic assumptions.

This is not to say that there isn't a case to be made for merit pay. I don't have any problem with rewarding teachers who do exceptional work, but the methods being discussed here are simply not the way to do it.

Chait's third point runs along similar lines:

3. In general, he's fitting this issue into his "progressives are too optimistic about the potential to rationalize policy" frame. I think that frame is useful -- indeed, of all the conservative perspectives on public policy, it's probably the one liberals should take most seriously. But when you combine the fact that the status quo system is demonstrably terrible, that nobody is trying to devise a formula to control the entire teacher evaluation process, and that nobody is promising the "silver bullet" he assures us doesn't exist, his argument has a bit of a straw man quality.

More argument by adverb and a strange double straw man (straw-straw man? straw straw man man?) continued from the soon-to-be-discussed point 2. The first 'nobody' is doubtful; Chait seems to jump from the fact that no state currently bases evaluations primarily on value-added metrics to the conclusion that no one is even looking into the possibility. The second 'nobody' is just plain wrong; many reform movement followers have so much faith in the silver bullet status of value-added metrics that they have seriously proposed firing more than half of our teachers based on that one number.

But the weirdest part came in point 2.

2. Manzi's description...
evaluating teacher performance by measuring the average change in standardized test scores for the students in a given teacher’s class from the beginning of the year to the end of the year, rather than simply measuring their scores. The rationale is that this is an effective way to adjust for different teachers being confronted with students of differing abilities and environments.
..implies that quantitative measures are being used as the entire system to evaluate teachers. In fact, no state uses such measures for any more than half of the evaluation. The other half involves subjective human evaluations.

Argument by ellipses. Take a look at the whole paragraph:

Recently, Megan McArdle and Dana Goldstein had a very interesting Bloggingheads discussion that was mostly about teacher evaluations. They referenced some widely discussed attempts to evaluate teacher performance using what is called “value-added.” This is a very hot topic in education right now. Roughly speaking, it refers to evaluating teacher performance by measuring the average change in standardized test scores for the students in a given teacher’s class from the beginning of the year to the end of the year, rather than simply measuring their scores. The rationale is that this is an effective way to adjust for different teachers being confronted with students of differing abilities and environments.

Manzi explicitly says "widely discussed attempts." Now, for the sake of comparison, check out the New York Times' similar wording:

A growing number of school districts have adopted a system called value-added modeling to answer that question, provoking battles from Washington to Los Angeles — with some saying it is an effective method for increasing teacher accountability, and others arguing that it can give an inaccurate picture of teachers’ work.
The system calculates the value teachers add to their students’ achievement, based on changes in test scores from year to year and how the students perform compared with others in their grade.

Manzi was perfectly clear with his wording and used language consistent with the New York Times' coverage. It was only by excerpting his paragraph mid-sentence that Chait was able to get even the suggestion of a distortion.

I have somewhat mixed feelings Manzi's business-based approach. There are certain aspects of education that are, if not unique, then at least highly unusual and you have to be careful when drawing analogies (obviously the subject for another, much longer post). That said, all of his points about the way evaluations work are valid and useful.

This is not a bad place to start the debate.

[You can read Jim Manzi's somewhat bewildered reaction to Chait's column here.]

Wednesday, February 9, 2011

Jim Manzi has some smart things to say about teacher evaluations

From the National Review (via Chait, but more on that later)

This seems like a broadly sensible idea as far as it goes, but consider that the real formula for calculating such a score in a typical teacher value-added evaluation system is not “Average math + reading score at end of year – average math reading score at beginning of year,” but rather a very involved regression equation. What this reflects is real complexity, which has a number of sources. First, at the most basic level, teaching is an inherently complex activity. Second, differences between students are not unvarying across time and subject matter. How do we know that Johnny, who was 20 percent better at learning math than Betty in 3rd grade is not relatively more or less advantaged in learning reading in fourth grade? Third, an individual person-year of classroom education is executed as part of a collective enterprise with shared contributions. Teacher X had special needs assistant 1 work with her class, and teacher Y had special needs assistant 2 working with his class — how do we disentangle the effects of the teacher versus the special ed assistant? Fourth, teaching has effects that continue beyond that school year. For example, how do we know if teacher X got a great gain in scores for students in third grade by using techniques that made them less prepared for fourth grade, or vice versa for teacher Y? The argument behind complicated evaluation scoring systems is that they untangle this complexity sufficiently to measure teacher performance with imperfect but tolerable accuracy.
Any successful company that I have ever seen employs some kind of a serious system for evaluating and rewarding / punishing employee performance. But if we think of teaching in these terms — as a job like many others, rather than some sui generis activity — then I think that the hopes put forward for such a system by its advocates are somewhat overblown.

There are some job categories that have a set of characteristics that lend themselves to these kinds of quantitative “value added” evaluations. Typically, they have hundreds or thousands of employees in a common job classification operating in separated local environments without moment-to-moment supervision; the differences in these environments make simple output comparisons unfair; the job is reasonably complex; and, often the performance of any one person will have some indirect, but material, influence on the performance of others over time. Think of trying to manage an industrial sales force of 2,000 salespeople, or the store managers for a chain of 1,000 retail outlets. There is a natural tendency in such situations for analytical headquarters types to say “Look, we need some way to measure performance in each store / territory / office, so let’s build a model that adjusts for inherent differences, and then do evaluations on these adjusted scores.”

I’ve seen a number of such analytically-driven evaluation efforts up close. They usually fail. By far the most common result that I have seen is that operational managers muscle through use of this tool in the first year of evaluations, and then give up on it by year two in the face of open revolt by the evaluated employees. This revolt is based partially on veiled self-interest (no matter what they say in response to surveys, most people resist being held objectively accountable for results), but is also partially based on the inability of the system designers to meet the legitimate challenges raised by the employees.

I found the point about techniques that hurt futures performance particularly good. When I was teaching, how well a class would go was greatly influenced by how well previous teachers had done their jobs. Did the students understand the foundations? Did they have a good attitude to the material? Good work habits and study strategies?

Teachers want reliable evaluations not just because they want to be rewarded for good work but also because they want to see incompetent teachers identified so that those teachers can be encouraged to do better, given training to improve their performance or, should the first two fail, fired. What they object to is having their fates rest on a glorified roll of the dice.