Thursday, March 27, 2014

On SAT changes, The New York Times gets the effect right but the direction wrong

That was quick.

Almost immediately after posting this piece on the elimination of the SAT's correction for guessing (The SAT and the penalty for NOT guessing), I came across this from Todd Balf in the New York Times Magazine.
Students were docked one-quarter point for every multiple-choice question they got wrong, requiring a time-consuming risk analysis to determine which questions to answer and which to leave blank. 
I went through this in some detail in the previous post but for a second opinion (and a more concise one), here's Wikipedia:
The questions are weighted equally. For each correct answer, one raw point is added. For each incorrect answer one-fourth of a point is deducted. No points are deducted for incorrect math grid-in questions. This ensures that a student's mathematically expected gain from guessing is zero. The final score is derived from the raw score; the precise conversion chart varies between test administrations.

The SAT therefore recommends only making educated guesses, that is, when the test taker can eliminate at least one answer he or she thinks is wrong. Without eliminating any answers one's probability of answering correctly is 20%. Eliminating one wrong answer increases this probability to 25% (and the expected gain to 1/16 of a point); two, a 33.3% probability (1/6 of a point); and three, a 50% probability (3/8 of a point). 
You could go even further. You don't actually have to eliminate a wrong answer to make guessing a good strategy. If you have any information about the relative likelihood of the options, guessing will have positive expected value.

The result is that, while time management for a test like the SAT can be complicated, the rule for guessing is embarrassingly simple: give your best guess for questions you read; don't waste time guessing on questions that you didn't have time to read.

The risk analysis actually becomes much more complicated when you take away the penalty for guessing. On the ACT (or the new SAT), there is a positive expected value associated with blind guessing and that value is large enough to cause trouble. Under severe time constraints (a fairly common occurrence with these tests), the minute it would take you to attempt a problem, even if you get it right, would be better spent filling in bubbles for questions you haven't read.

Putting aside what this does to the validity of the test, trying to decide when to start guessing is a real and needless distraction for test takers. In other words, just to put far too fine a point on it, the claim about the effects of the correction for guessing aren't just wrong; they are the opposite of right. The old system didn't  require time-consuming risk analysis but the new one does.

As I said in the previous post, this represents a fairly small aspect of the changes in the SAT (loss of orthogonality being a much bigger concern). Furthermore, the SAT represents a fairly small and perhaps even relatively benign part of the story of David Coleman's education reform initiatives. Nonetheless, this one shouldn't be that difficult to get right, particularly for a publication with the reputation of the New York Times.

Of course, given that this is the second recent high-profile piece from the paper to take an anti-SAT slant, it's possible certain claims weren't vetted as well as others.


  1. I clicked through to Balf's article and I also noticed this:

    "When the Scholastic Aptitude Test was created in 1926, it was promoted as a tool to create a classless, Jeffersonian-style meritocracy."

    Huh? Is a Jeffersonian-style meritocracy the system where everyone is equal and so we all own slaves? I guess if this article were written in Russia it would all about how the SAT was promoted as a tool to create a Lenin-style democracy.

    The problem I see here is that Balf seems to be dealing in images and impressions rather than thinking through his ideas. "Jefferson" and "meritocracy" have positive images, so they go together, the old SAT was bad so therefore it required "a time-consuming risk analysis," etc.

    1. I'll have at least a couple more posts on Balf's piece, but what comes through wherever you look is a journalist retelling the approved narrative without any real understanding of the underlying principles. He deals in emotional associations because that's what he does understand (and, I suspect, because that's the kind of journalism his editors want).