West Coast Stat Views (on Observational Epidemiology and more): Eight years ago at the blog -- R.I.P. old (and better) SAT

Monday, March 28, 2022

Eight years ago at the blog -- R.I.P. old (and better) SAT

Not the timeliest subject, I'll admit, but the way bad changes are sold to a mathematically illiterate press as improvements is always relevant.

Wednesday, March 26, 2014

The SAT and the penalty for NOT guessing

Last week we had a post on why David Coleman's announcement that the SAT would now feature more "real world" problems was bad news, probably leading to worse questions and almost certainly hurting the test's orthogonality with respect to GPA and other transcript-based variables. Now let's take a at the elimination of the so-called penalty for guessing.

The SAT never had a penalty for guessing, not in the sense that guessing lowed your expected score. What the SAT did have was a correction for guessing. On a multiple-choice test without the correction (which is to say, pretty much all tests except the SAT), blindly guessing on the questions you didn't get a chance to look at will tend to raise your score. Let's say, for example, two students took a five-option test where they knew the answers to the first fifty questions and had no clue what the second fifty were asking (assume they were in Sanskrit). If Student 1 left the Sanskrit questions blank, he or she would get fifty point on the test. If Student 2 answered 'B' to all the Sanskrit questions, he or she would probably get around sixty points.

From an analytic standpoint, that's a big concern. We want to rank the students based on their knowledge of the material but here we have two students with the same mastery of the material but with a ten-point difference in scores. Worse yet, let's say we have a third student who knows a bit of Sanskrit and manages to answer five of those questions, leaving the rest blank thus making fifty-five points. Student 3 knows the material better than Student 2 but Student 2 makes a higher score. That's pretty much the worst possible case scenario for a test.

Now let's say that we subtracted a fraction of a point for each wrong answer -- 1/4 in this case, 1/(number of options - 1) in general -- but not for a blank. Now Student 1 and Student 2 both have fifty points while Student 3 still has fifty-five. The lark's on the wing, the snail's on the thorn, the statistician has rank/ordered the population and all's right with the world.

[Note that these scales are set to balance out for blind guessing. Students making informed guesses ("I know it can't be 'E'") will still come out ahead of those leaving a question blank. This too is as it should be.]

You can't really say that Student 2 has been penalized for guessing since the outcome for guessing is, on average, the same as the outcome for not guessing. It would be more accurate to say that 1 and 3 were originally penalized for NOT guessing.

Compared to some of the other issues we've discussed regarding the SAT, this one is fairly small, but it does illustrate a couple of important points about the test. First, the SAT is a carefully designed tests and second, some of the recent changes aren't nearly so well thought out.

Thursday, March 27, 2014

On SAT changes, The New York Times gets the effect right but the direction wrong

That was quick.

Almost immediately after posting this piece on the elimination of the SAT's correction for guessing (The SAT and the penalty for NOT guessing), I came across this from Todd Balf in the New York Times Magazine.

Students were docked one-quarter point for every multiple-choice question they got wrong, requiring a time-consuming risk analysis to determine which questions to answer and which to leave blank.

I went through this in some detail in the previous post but for a second opinion (and a more concise one), here's Wikipedia:

The questions are weighted equally. For each correct answer, one raw point is added. For each incorrect answer one-fourth of a point is deducted. No points are deducted for incorrect math grid-in questions. This ensures that a student's mathematically expected gain from guessing is zero. The final score is derived from the raw score; the precise conversion chart varies between test administrations.

The SAT therefore recommends only making educated guesses, that is, when the test taker can eliminate at least one answer he or she thinks is wrong. Without eliminating any answers one's probability of answering correctly is 20%. Eliminating one wrong answer increases this probability to 25% (and the expected gain to 1/16 of a point); two, a 33.3% probability (1/6 of a point); and three, a 50% probability (3/8 of a point).

You could go even further. You don't actually have to eliminate a wrong answer to make guessing a good strategy. If you have any information about the relative likelihood of the options, guessing will have positive expected value.

The result is that, while time management for a test like the SAT can be complicated, the rule for guessing is embarrassingly simple: give your best guess for questions you read; don't waste time guessing on questions that you didn't have time to read.

The risk analysis actually becomes much more complicated when you take away the penalty for guessing. On the ACT (or the new SAT), there is a positive expected value associated with blind guessing and that value is large enough to cause trouble. Under severe time constraints (a fairly common occurrence with these tests), the minute it would take you to attempt a problem, even if you get it right, would be better spent filling in bubbles for questions you haven't read.

Putting aside what this does to the validity of the test, trying to decide when to start guessing is a real and needless distraction for test takers. In other words, just to put far too fine a point on it, the claim about the effects of the correction for guessing aren't just wrong; they are the opposite of right. The old system didn't require time-consuming risk analysis but the new one does.

As I said in the previous post, this represents a fairly small aspect of the changes in the SAT (loss of orthogonality being a much bigger concern). Furthermore, the SAT represents a fairly small and perhaps even relatively benign part of the story of David Coleman's education reform initiatives. Nonetheless, this one shouldn't be that difficult to get right, particularly for a publication with the reputation of the New York Times.

Of course, given that this is the second recent high-profile piece from the paper to take an anti-SAT slant, it's possible certain claims weren't vetted as well as others.

West Coast Stat Views (on Observational Epidemiology and more)