Monday, August 10, 2015

There is nothing more dangerous than a data-driven system designed and administered by people who don't understand statistics

 From the Washington Post's Valerie Strauss
 A veteran teacher suing New York state education officials over the controversial method they used to evaluate her as “ineffective” is expected to go to New York Supreme Court in Albany this week for oral arguments in a case that could affect all public school teachers in the state and even beyond.

Sheri G. Lederman, a fourth-grade teacher in New York’s Great Neck public school district, is “highly regarded as an educator,” according to her district superintendent, Thomas Dolan, and has a “flawless record”. The standardized math and English Language Arts test scores of her students are consistently higher than the state average.

Yet her 2013-2014 evaluation, based in part on student standardized test scores, rated her as “ineffective.” How can a teacher known for excellence be rated “ineffective”? It happens — and not just in New York.


Testing experts have for years been warning school reformers that efforts to evaluate teachers using VAM are not reliable or valid, but school reformers, including Education Secretary Arne Duncan and New York Gov. Andrew Cuomo, both Democrats, have embraced the method as a “data-driven” evaluation solution championed by some economists.

Lederman’s suit against state education officials — including John King, the former state education commissioner, who now is a top adviser to Duncan at the Education Department — challenges the rationality of the VAM model used to evaluate her and, by extension, other teachers in the state. The lawsuit alleges that the New York State Growth Measures “actually punishes excellence in education through a statistical black box which no rational educator or fact finder could see as fair, accurate or reliable.”

It also, in many aspects, defies comprehension. High-stakes tests are given only in math and English language arts, so reformers have decided that all teachers (and, sometimes, principals) in a school should be evaluated by reading and math scores. Sometimes, school test averages are factored into all teachers’ evaluations. Sometimes, a certain group of teachers are attached to either reading or math scores; social studies teachers, for example, are more often attached to English Language Arts scores, while science teachers are attached to math scores. An art teacher in New York City explained in this post how he was evaluated on math standardized test scores and saw his evaluation rating drop from “effective” to “developing.”

A teacher in Florida — which is another state that uses VAM — discovered that his top-scoring students actually hurt his evaluation. How? In Indian River County, Fla., an English Language Arts middle school teacher named Luke Flynt told his school board that through VAM formulas, each student is assigned a “predicted” score — based on past performance by that student and other students — on the state-mandated standardized test. If the student exceeds the predicted score, the teacher is credited with “adding value.” If the student does not do as well as the predicted score, the teacher is held responsible and that score counts negatively toward his/her evaluation. He said he had four students whose predicted scores were “literally impossible” because they were higher than the maximum number of points that can be earned on the exam. He said:

    “One of my sixth-grade students had a predicted score of 286.34. However, the highest a sixth-grade student can earn earn is 283. The student did earn a 283, incidentally. Despite the fact that she earned a perfect score, she counted negatively toward my valuation because she was 3 points below predicted.

Affidavits of numerous experts supporting Lederman have been filed — including from Stanford University professor Linda Darling-Hammond — and you can see them here. Oral arguments are scheduled to be heard Wednesday, Aug. 12. Should Lederman successfully challenge the New York teacher evaluation system, state officials might have to revamp it.

1 comment:

  1. From my previous comments on this blog, one might (correctly) infer that I am a supporter of the Common Core *standards.* I also am supportive of the use of carefully developed and carefully interpreted standardized testing as a way to evaluate whether students are attaining the levels of skill and knowledge spelled out by those standards.

    But the people who are using VAM to evaluate teachers are clearly either ignorant or malevolent, and they are out of control! Linda Darling-Hammond's deposition is well worth the brief time it takes. What's most shocking is that it demolishes the use of VAM under consideration using not complex statistical analysis but basic principles of psychometrics 101. The people pushing New York's VAM are clearly clueless.

    I think the underlying problem is the entrepreneurialism underlying the "education reform" movement. I am reminded of the introduction of bone marrow transplants as a treatment for hematologic malignancies that had failed standard treatments. It was a major advance in that context. But in no time, our entrepreneurial health care system was using it to treat all sorts of other conditions for which it was not only ineffective but extremely dangerous. It took several years before that mayhem was finally brought to a stop. Interestingly, along the way, we also learned that some of the "evidence" supporting bone marrow transplants in other conditions was based on faked data. Zealotry and fraud often go hand in hand, it seems.

    By lunging ahead without adequate thought, planning, and step-by-step research, the entrepreneurial education system is ultimately undermining their own cause (assuming their cause is truly to improve education and not just to extract rents.)

    The development of high-quality tests that provide accurate and sufficiently precise measures of achievements is, as anyone who has been involved in the process can tell you, a very labor-intensive endeavor that requires expert-level inputs from both subject matter disciplines, educators, and statisticians/psychometricians. You can't cut corners and there are no acceptable short cuts. It is not the kind of process that readily lends itself to an entrepreneurial system that rewards being first over being best. Nor does it lend itself to a system that relentlessly seeks to cut costs (at least not while it's being developed).

    It *may* be possible, over time, to develop a valid teacher assessment based on growth models. But that requires the development of a different kind of test from the kind of test that validly measures student level mastery of the educational objectives, and many years of experimental use and refinement before it is unleashed for actual use. Personally, I would like to see us, as a society, start down this path to see if it can actually be done. But what we are seeing rolled out now is not the way to do it, and, in the end, it will only set us back and it may well take the use of clear and appropriate standards and testing down with it when it finally blows up.