West Coast Stat Views (on Observational Epidemiology and more)

Thursday, April 10, 2014

538 and Vox

Kaiser Fung made a comment in this thread:

LIke Andrew, I also have been thinking about this, and I come out on the side of Nate. Individually, the critique stands but taken together, they don't call for any coherent vision of how his critics would run an operation such as his. The level of rigor that Krugman and others demand requires years, perhaps decades, of research to write one piece; meanwhile, the other critique is the content is not timely. Think about the full-time journalists he has hired - there isn't a way to pay them enough to do the kind of pieces that are being imagined. As we all know, data collection, cleaning and analysis take a huge amount of time. It may be months of work to get one article out. Further, I'd like to judge them relative to competitors rather than in some kind of abstract universe. Compared to the Freakonomics blog, for example, 538 has a much better orientation. Compare to Huffington Post - when did HP have any real data journalism? Compare to Buzzfeed, don't even want to talk about it.

Now, this is Joseph and not Mark. My view was that you simply cannot judge a publication until there has been six months or so to let things settle. I suspect a lot of the criticism was driven by the climate change article -- and it is interesting to see that this is where people's passions are the highest.

Other columnists, like Emily Oster, are much more subtle cases. I was very dismissive of Emily after her first foray into public health. Her second has seen a lot of criticism as well, but what is different is that the current round is based on careful weighing of evidence and very subtle issues of interpretation (and this was only for a single, small piece of a much larger work). She is getting a lot better.

And that is part of where I am optimistic about Nate Silver. He is doing something really hard and it remains to be seen if the criticism slowly improves matter.

In a lot of ways, the other new information based news source (Vox) has the exact opposite problem. They spent a huge amount of time trying to make some really good pieces (like this one) and grab some of the people I used to read elsewhere (even obscure ones like this). But it will be interesting to see if they can keep up the kick-off level of quality over time.

So I guess the really good news is that we are spoiled for choice with new, information rich, media start-ups. It's hard to see how this is a bad thing.

Wednesday, April 9, 2014

The Hedgehog who thought he was a fox -- a cautionary tale

The growing chorus of Nate Silver fans critical of (or at least perplexed by) the new Five Thirty Eight have caught a great deal of media coverage, mainly for the wrong reasons. Conservatives have painted it as a case of liberals turning on one of their own. Pundits have tried to use the recent critiques of Silver to undercut his earlier, completely unrelated critiques of them (I'm debating whether or not to write a post on Dylan Byers' laughable misreading of Krugman's position. On one hand, it's bad enough to support a post. On the other hand, I'm busy, Charles Pierce already did a good job with it and I'm pretty sure that most people already know what Byers is).

There has been some good work on the subject. Jonathan Chait does a sharp analysis of Krugman's and Silver's personalities and how they shaped the conflict (best line: "Somewhere, David Brooks is reading Silver’s argument that Paul Krugman refuses to attack his colleagues and laughing bitterly."), but other (for me) more interesting issues have gotten less coverage than they merit, things like the culture of statistics, the often depressing career paths promising thinkers take these days* and the dangers of a bad analogy.

It sometimes seems that there's a convention that once a debate has been framed, that framework must be respected, no matter how badly it holds up. Case in point, the fox and the hedgehog. Here's how Silver puts it:

Our logo depicts a fox (we call him Fox No. 92) as an allusion to a phrase originally attributed to the Greek poet Archilochus: “The fox knows many things, but the hedgehog knows one big thing.” We take a pluralistic approach and we hope to contribute to your understanding of the news in a variety of ways.

This is doubly flawed analogy. Expertise is a spiky, complicated thing that doesn't lend itself to scalar measures, let alone binomial. Any attempt to assign people positions on the fox/hedgehog spectrum will be problematic at best with order shifting radically when weighting schemes change. If we do, however, decide to view the world through this framework, we immediately come to an even bigger objection to in Silver's arguments:

Nate Silver is a hedgehog.

There is nothing pejorative about this classification. Silver has done brilliant work. It's just that almost all of Silver's best work has been done using a small but powerful set of analytic tools to address thorny but structurally similar problems in sports and politics. In terms of methods, Silver is a specialist; in terms of data, he's a micro-specialist. Silver has an enormous body of knowledge about working with player stats or polling data, but most of that knowledge is completely field specific.

There's nothing wrong with this kind of specialization -- its absolutely necessary for the kind of results Silver produced -- but it can cause problems when researchers move out of their areas of expertise and fail to adjust for the change. In other words, the trouble starts when hedgehogs think they're foxes.

Being a fox means living with the constant fear that you've just done something stupid that will be immediately obvious to anyone knowledgeable in the field. Ideally that fear leads to a heightened feel for danger levels. Most experienced foxes have developed an instinct for when to seek out a hedgehog. As a corollary, a good fox is always (and I do mean ALWAYS) more willing to ask a stupid question than to make a stupid statement.

For a case study of what can go wrong when experts leave their area of expertise and don't adjust their caution levels, you don't have to look any farther than Silver's attempt to cover the climate change debate. Michael E. Mann assesses the damage:

And so I was rather crestfallen earlier this summer when I finally got a peek at a review copy of The Signal and the Noise: Why So Many Predictions Fail -- but Some Don't. It's not that Nate revealed himself to be a climate change denier; He accepts that human-caused climate change is real, and that it represents a challenge and potential threat. But he falls victim to a fallacy that has become all too common among those who view the issue through the prism of economics rather than science. Nate conflates problems of prediction in the realm of human behavior -- where there are no fundamental governing 'laws' and any "predictions" are potentially laden with subjective and untestable assumptions -- with problems such as climate change, which are governed by laws of physics, like the greenhouse effect, that are true whether or not you choose to believe them.

...

Unlike Levitt, Nate did talk to the scientists (I know. I'm one of them!). But he didn't listen quite as carefully as he should have. When it came to areas like climate change well outside his own expertise, he to some extent fell into the same "one trick pony" trap that was the downfall of Levitt (and arguably others like Malcolm Gladwell in The Tipping Point). That is, he repeatedly invokes the alluring, but fundamentally unsound, principle that simple ideas about forecasting and prediction from one field, like economics, can readily be appropriated and applied to completely different fields, without a solid grounding in the principles, assumptions, and methods of those fields. It just doesn't work that way (though Nate, to his credit, does at least allude to that in his discussion of Armstrong's evaluation of climate forecasts).

I'm singling out Silver here not because he's a bad statistician but because he's a very good one who fell into the increasingly common traps of believing that the world outside of his specialty is simpler and that, if you understand the math you automatically understand the problem. Each field is complex and , like Tolstoy's families, complex in its own way. If you want to have something useful to say in an unfamiliar area of research, knowing the statistics may be necessary but it is far from sufficient.

* On a related note you can find my thoughts on Five Thirty Eight's business model here.

Tuesday, April 8, 2014

The moment it became obvious that Nate Silver wasn't really listening to his critics

Paul Krugman closes a post criticizing the new 538 with the following:

What would be really bad is if this turns into a Freakonomics-type exercise, all contrarianism without any appreciation for the importance of actual expertise. And Michael Mann reminds me that Nate’s book already had some disturbing tendencies in that direction.

In response...

Silver, for his part, said he doesn't shun the negative reviews that FiveThirtyEight has drawn in its infancy, telling TPM that much of the criticism will help the site improve. He just doesn't think Krugman's assessment has been on the mark.

"His comment about experts was particularly strange given that (i) we publish lots of articles by experts, e.g. academic economists like Emily Oster and political scientists like Dan Hopkins and that (ii) Krugman has himself been very critical of Very Serious People and experts in economics and other fields," Silver wrote in the email.

For the record, if someone accuses you of having analysts bungee jump into areas they know nothing about then crank out a bunch of contrarian findings, the fact that you just hired Emily Oster should not feature prominently in your defense.

Monday, April 7, 2014

Imagine an all-curling channel...

Picture yourself a network executive in charge of product development. A producer approaches you with a new US cable channel based on the sport of curling. The producer supports his presentation with various graphs showing that:

Current awareness of the sport is high given its size and has trended steadily up in the period measured;

The sport is currently receiving considerable free publicity, particularly in the references and clips on late night talk shows and other sought-after spots;

Those most likely to be aware of the sport tend to be young with attractive demographics;

Sports is more resilient to competition from the internet;

The programming is incredibly cheap. Many of the leading figures in the sport have literally offered to work for beer.

As an executive, you might be suspicious of these facts (which, after all, I did just make up), but I'll bet you have another, much stronger objection, namely that Americans are aware of curling for about five weeks every four years. You can't base a network on this kind of few-and-far-between spikes in viewership. In order to make this concept workable, it will some attention-grabbing non-seasonal programming, perhaps Extreme Curling or Celebrity Curling.

This takes us to the other major quadrennial media event, the presidential elections and to Nate Silver. If you're talking about horse-race political analysis, there is no bigger star than Silver and no one who deserves his or her fame more. If you make a list of people who really understand the science of polls and elections and another list of journalists with great media savvy and extremely high profiles, you'll get a lot off names on both lists, but if you look at the intersection, you're basically down to one name.

All of this made Silver a big journalistic star every election season. When he was part of the NYT, this worked out great. Every four years he brought in a huge amount of traffic (and presumably digital subscriptions) while the rest of the time he gave the paper analytic credibility. It was a win for Silver, a win for the paper and a win for the readers.

That trickle... trickle... trickle... FLOOD model won't work for the new 538. Despite the relationship with ESPN and ABC, Silver is now pursuing more of a freestanding model like Freakonomics or even the Huffington Post. Like our hypothetical curling channel, he also needs attention-grabbing non-seasonal programming, in this case, counter-intuitive stories by controversial writers who are good at bringing in traffic in part because people like to pick away at their errors.

There are at least a couple of problems with this approach: first, this is a horribly crowded field and the chances of success are not high; second, there's a significant reputational risk in being associated with these controversial writers and, given the extraordinary reputation Silver has worked so hard to build up, this is a risk he may come to regret taking.

Friday, April 4, 2014

"N.R.G. Pick-Ups are PURE DEXTROSE" -- junk food health claims through the years

I previously posted some heath advice from a 1950 comic book which Joseph pointed out was actually pretty good. By comparison, the nutritional information that appeared in this issue of the very early (1937) comic Star Ranger is considerably more questionable. Part of that might be attributable to being more thana decade older, but I suspect the more significant difference is that the later comic was presenting the advice as something of a PSA while I suspect the Curtiss Candy Co might have had another agenda.

Thursday, April 3, 2014

Degrees of separation – class and capital

This is still in rough form (though I've been kicking it around for a while), but I thought it might be interesting to think about inequality/social mobility/access to capital in terms of networks, specifically degrees of separation and Milgram's small world experiment.

New media has expanded our social networks, but it has also created the illusion of even larger ones (often Facebook friends and Linked In connections would be considered complete strangers by any reasonable standard). In order to keep our networks roughly analogous to Milgram's, a connection is defined as someone who knows you by name and with whom you have had multiple one-on-one private exchanges either face-to-face or through some other medium.

You have zero degrees of separation from your self. (This point will be important when discussing capital. In other words you have no separation from your own money.) you have one degree of separation from someone who you have had repeated one-on-one contact with. I'd also suggest excluding employer/employee connections, at least when talking about class and capital. These relationships tend to be highly constrained and should, at the very least be analyzed separately.

With this groundwork laid, I'd like to propose the following, at least as a thought experiment. The original Milgram study looked at the degrees of separation between people who lived Omaha and people who lived in Boston. What if, instead of geographic distance (which arguably means less than it once did), we looked at economic distance (which arguably now means more)?

As before, randomly selected subjects will be asked to connect with strangers and the path length would be measured. Unlike the Milgram study, though, the corresponding pairs of subjects would live in the same geographic area. In this experiment, subjects will be assigned targets so that some are trying to contact subjects in their own income bracket, some are trying to contact subjects in brackets lower than theirs and some are trying to contact people in brackets higher.

Obviously, I don't know if the data will back me up on any of this but here are a few speculations and possible implications:

Though we can't go back in time to gather the data to confirm this, there is both statistical and anecdotal evidence that the correlation between economic distance and degrees of separation is getting stronger;

There seems to be a high inverse correlation between degrees of separation from capital and probability of getting a business funded. This relationship appears to be particularly strong for really bad business plans. I've noticed that when I do a little research into one of those what-were-they-thinking ideas, I always find at least one founder with a low degree of separation from someone with a large amount of capital;

One implication of the above would be that ventures (even bad ones) from people who attended Ivy League schools are far more likely to find funding. I realize this will strike most as a blinding flash of the obvious, but hopefully bringing graph theory tools in will uncover something interesting;

Increasing degrees of separation might also help explain the apparent rise of let-them-eat-cake journalism. We previously discussed a number of major stories such as the SAT and over-the-air television where the standard narrative is written, not just from an upper class perspective, but seemingly under the impression that no other perspective exists. Perhaps journalists who write for major publications are less likely to know people in other economic classes.

A big caveat here. path length is a useful but very limited metric for discussing graphs. I think it would be useful to look at degrees of separation but I suspect the main thing it would accomplish would be to raise more questions.

Wednesday, April 2, 2014

Causal inference is hard

From Slate we have this interesting debate about what ended China's famines:

Scholars continue to argue over how much of China’s agricultural turnaround was due to the capitalist incentive structure, how much resulted from earlier investments, and how much was a trick of the weather. Some say the end of collective farming accounted for nearly three-quarters of the improvements in productivity, while others say it was responsible for no more than one-third.

It’s fine to treat China’s food revolution as a fairy tale. The changes were so dramatic that it’s hard not to. But let’s make sure we get the moral of this story correct. Changing the incentives isn’t a magic trick that can turn any lagging economy into a global juggernaut. Investment in infrastructure, research and development, and putting money into the pockets of workers work wonders as well. And a little sunshine doesn’t hurt, either.

So we basically have five possible explanations, all of which could explain some or all of this change:

Ending collective farming (capitalist reform)
Infrastructure development
Government subsidies to farmers (i.e. financial support to poor people)
Research on improved crops
Unexpected good weather

What makes this tough is that many of these explanations suggest different policy conclusions when you try and apply these lessons to other contexts. For example, if the dominant cause was improved infrastructure then maybe we should tax more in order invest in infrastructure projects. If it was giving more money to poor people then maybe the minimum wage is where we should put our focus. If it was the weather (luck) then maybe these results can't be generalized.

Since complex phenomenon, like improved food supply, like have many causes, it can be hard to decide which ones to focus on. After all, some of these factors could have been counter-productive, but the next causal effect could be positive.

But it seems pretty obvious why experiments are not sensible here. These sorts of questions are, and I think always will be, very hard to answer.

There are valid reasons to be concerned about the SAT, starting with its history

Given the dust and confusion being kicked up by the SAT, there's a point that I want to get on the record. Though the standard critiques of the SAT, most notably from the New York Times are flawed in almost every particular (except about the essay section, which pretty much everyone now agrees was a train wreck), there are valid critiques of the test, both in its current state and in where it came from.

From Wikipedia:

After the war in 1920, Brigham joined Princeton as a faculty member, and he collaborated with Robert Yerkes' from the Army Mental Tests and published their results in the influential 1923 book, A Study of American Intelligence authored by Brigham with the forward by Yerkes. Analyzing the data from the Army tests, Brigham came to the conclusion that native born Americans had the highest intelligence out of the groups tested. He proclaimed the intellectual superiority of the "Nordic Race" and the inferiority of the "Alpine" (Eastern European) and "Mediterranean Races" and argued that immigration should be carefully controlled to safeguard the "American Intelligence." Nothing troubled Brigham so much however, as miscegenation between blacks and whites, as Brigham believed "Negroes" were by far the most intellectually inferior race.

Though he later in 1930 denounced his expressed views on the intellectual superiority of the "Nordic Race" and specifically disowned the book, it had already been instrumental in fueling anti-immigrant sentiment in America and the eugenics debate. It was used most effectively by Harry Laughlin in the 1924 congressional debates leading to anti-immigrant legislation.

Brigham chaired the College Board commission from 1923 to 1926, leading to the creation of the Scholastic Aptitude Test, now simply called the SAT Reasoning Test.

One of the hidden costs of bad arguments dominating one side of a debate is that they tend to crowd out the valid arguments on that side. I haven't seen convincing evidence that the SAT is being over-emphasized in the college selection process or that, other than the essay section, the test was urgently in need of radical changes, but there serious and precedented concerns with the way the test can be misused.

Tuesday, April 1, 2014

Being a management consultant who does not suffer fools is like being an EMT who faints at the sight of blood

An April 1st post on foolishness.

When [David] Coleman attended Stuyvesant High in Manhattan, he was a member of the championship debate team, and the urge to overpower with evidence — and his unwillingness to suffer fools — is right there on the surface when you talk with him.

Todd Balf writing in the New York Times Magazine

Andrew Gelman has already commented on the way Balf builds his narrative around Coleman ( "In Balf’s article, College Board president David Coleman is the hero and so everything about him has to be good and everything he’s changed has to have been bad.") and the not suffering fools quote certainly illustrates Gelman's point, but it also illustrates a more important concern: the disconnect between the culture of the education reform movement and the way it's perceived in most of the media.

(Though not directly relevant to the main point of this post, it is worth noting that the implied example that follows the line about not suffering fools is a description of Coleman rudely dismissing those who disagree with his rather controversial belief that improvement in writing skills acquired through composing essays doesn't transfer to improvements in writing in a professional context.)

There are other powerful players (particularly when it comes to funding), but when it comes to its intellectual framework, the education reform movement is very much a product of the world of management consultants with its reliance on Taylorism, MBA thinking and CEO worship. This is never more true than with David Coleman. Coleman is arguably the most powerful figure in American education despite having no significant background in either teaching or statistics. His only relevant experience is as a consultant for McKinsey & Company.

Companies like McKinsey spend a great deal off their time trying to convince C-level executive to gamble on trendy and expensive "business solutions" that are usually unsupported by solid evidence and are often the butt of running jokes in recent Dilbert cartoons. While it may be going too far to call fools the target market of these pitches, they certainly constitute an incredibly valuable segment.

Fools tend to be easily impressed by invocations of data (even in the form of meaningless phrases like 'data-driven'), they are less likely to ask hard questions (nothing takes the air out of a proposal faster than having to explain the subtle difference between your current proposal and the advice you gave SwissAir or AOL Time Warner), and fools are always open to the idea of a simple solution to all their problems which everyone else in the industry had somehow missed. Not suffering fools gladly would have made for a very short career for Coleman at McKinsey.

Monday, March 31, 2014

Perhaps we should add "opaque" to the list of journalists' vocabulary questions

Last week, Andrew Gelman criticized Todd Balf for picking words and phrases for their emotional connotation rather than for their actual meaning in his New York Times Magazine article on the changes in the SAT. 'Jeffersonian' was the specific term that Gelman choked on. I'd add 'opaque' to the list though the blame here mainly goes to David Coleman, president of the College Board and quite possibly the most powerful figure in the education reform movement:

For the College Board to be a great institution, [Coleman] thought at the time, it had to own up to its vulnerabilities. ... “It is a problem that it’s opaque to students what’s on the exam."

There's a double irony here. First because Coleman has been a long-standing champion of some very opaque processes, notably including those involving standardized tests, and second because test makers who routinely publish their old tests and who try to keep those tests as consistent as possible from year to year are, by definition, being transparent.

This leads to yet another irony: though the contents of the tests are readily available, almost none of the countless articles on the SAT specifically mention anything on the test. The one exception I can think of is the recent piece by Jennifer Finney Boylan, and it's worth noting that the specific topic she mentioned isn't actually on the test.

Being just a lowly blogger, I am allowed a little leeway with journalistic standards, so I'm going to break with tradition and talk about what's actually on the math section of the SAT.

Before we get to the questions, I want to make a quick point about geometry on the SAT. I've heard people argue that high school geometry is a prerequisite for the SAT. I don't buy that. Taking the course certainly doesn't hurt, but the kind of questions you'll see on the exam are based on very basic geometry concepts which students should have encountered before they got to high school. With one or two extremely intuitive exceptions, all the formulas you need for the test are given in a small box at the top of the first page.

As you are going through these questions, keep in mind that you don't have to score all that high. 75% is a good score. 90% is a great one.

You'll hear a lot about trick questions on the SAT. Most of this comes from the test's deliberate avoidance of straightforward algorithm questions. Algorithm mastery is always merely an intermediary step -- we care about it only because it's often a necessary step in problem solving (and as George Pólya observed, if you understand the problem you can always find someone to do the math) -- but when students are used to being told to factor this and simplify that, being instead asked to solve a problem, even when the algorithms involved are very simple, can seem tricky and even unfair.

There are some other aspects of the test that contribute to the reputation for trickiness:

Questions are written to be read in their entirety. One common form breaks the question into two parts where the first part uses a variable in an equation and the second asks the value of a term based on that variable. It's a simple change but it does a good job distinguishing those who understand the problem from those who are merely doing Pavlovian mathematics where the stimulus is a word or symbol and the response is the corresponding algorithm;

Word problems are also extensively used. Sometimes the two-part form mentioned above is stated as a word problem;

One technique that very probably would strike most people as 'tricky' actually serves to increase the fairness of the test, the use of newly-minted notation. In the example below, use of standard function notation would give an unfair advantage to students who had taken more advanced math courses.

One thing that jumps out when us math types is how simple the algebraic concepts used are. The only polynomial factoring you are ever likely to see on the SAT is the difference between two squares.

A basic understanding of the properties of real numbers is required to answer many of the problems.

A good grasp of exponents will also be required for a perfect score.

There will be a few problems in basic statistics and probability:

I've thrown in a few more to make it a more representative sample.

We can and should have lots of discussions about the particulars here -- I'm definitely planning a post on Pavlovian mathematics (simple stimulus/algorithmic response) -- but for now I just want to squeeze in one quick point:

Whatever the SAT's faults may be, opaqueness is not among them. Unlike most of the instruments used in our metric-crazed education system, both this test and the process that generates it are highly transparent. That's a standard that we ought to start extending to other tests as well.

Saturday, March 29, 2014

Weekend blogging -- due to cuts in arts programs, school orchestras have been forced to adopt extreme cost-cutting measures

When you get past the novelty, the musicianship is even more impressive.

The novelty is, of course, what initially drives the clicks but it ages quickly when played badly. (From Spike Jones to the Austin Lounge Lizards, successful comic music acts tend to require solid musicians.)

I'd go further in this case. For me, they move entirely past the joke. Their AC/DC covers are so driving and percussive you soon stop wondering why these cellos are playing heavy metal and start wondering why don't all heavy metal acts use cellos.

A musician friend, who had only seen "Every Teardrop," asked if they always played just one cello. I told him no, someone would lose a finger.

These guys aren't the first classical musicians to display a talent for popular music. Yo-Yo Ma comes to mind and, believe it or not, Liberace started out as both a very promising concert pianist and a first rate Boogie-woogie piano player, but I can't recall any performers who moved so smoothly back and forth.

Friday, March 28, 2014

Fiscal prudence (a never ending saga)

This is an important point about financial planning from Megan McArdle:

When people end up in financial trouble, you often hear tsk-tsking about premium cable and fancy vacations. But if you talk to bankruptcy lawyers and financial counselors, that isn't the normal story you hear. You're more likely to hear about car loans, mortgages, alimony. In other words, it's not the luxury splurges that do you in -- it's the fixed expenses. That's because discretionary luxury expenses can be cut in an emergency, while the fixed payments go on and on until they empty your bank account.

She might be a libertarian in her politics, but she is a lot like a Canadian in terms of fiscal prudence. Now I agree that there may be larger social issues that are making it harder for people to meet fixed expenses (e.g. wage stagnation) but at an individual level this is a calculation well worth making.

Thursday, March 27, 2014

On SAT changes, The New York Times gets the effect right but the direction wrong

That was quick.

Almost immediately after posting this piece on the elimination of the SAT's correction for guessing (The SAT and the penalty for NOT guessing), I came across this from Todd Balf in the New York Times Magazine.

Students were docked one-quarter point for every multiple-choice question they got wrong, requiring a time-consuming risk analysis to determine which questions to answer and which to leave blank.

I went through this in some detail in the previous post but for a second opinion (and a more concise one), here's Wikipedia:

The questions are weighted equally. For each correct answer, one raw point is added. For each incorrect answer one-fourth of a point is deducted. No points are deducted for incorrect math grid-in questions. This ensures that a student's mathematically expected gain from guessing is zero. The final score is derived from the raw score; the precise conversion chart varies between test administrations.

The SAT therefore recommends only making educated guesses, that is, when the test taker can eliminate at least one answer he or she thinks is wrong. Without eliminating any answers one's probability of answering correctly is 20%. Eliminating one wrong answer increases this probability to 25% (and the expected gain to 1/16 of a point); two, a 33.3% probability (1/6 of a point); and three, a 50% probability (3/8 of a point).

You could go even further. You don't actually have to eliminate a wrong answer to make guessing a good strategy. If you have any information about the relative likelihood of the options, guessing will have positive expected value.

The result is that, while time management for a test like the SAT can be complicated, the rule for guessing is embarrassingly simple: give your best guess for questions you read; don't waste time guessing on questions that you didn't have time to read.

The risk analysis actually becomes much more complicated when you take away the penalty for guessing. On the ACT (or the new SAT), there is a positive expected value associated with blind guessing and that value is large enough to cause trouble. Under severe time constraints (a fairly common occurrence with these tests), the minute it would take you to attempt a problem, even if you get it right, would be better spent filling in bubbles for questions you haven't read.

Putting aside what this does to the validity of the test, trying to decide when to start guessing is a real and needless distraction for test takers. In other words, just to put far too fine a point on it, the claim about the effects of the correction for guessing aren't just wrong; they are the opposite of right. The old system didn't require time-consuming risk analysis but the new one does.

As I said in the previous post, this represents a fairly small aspect of the changes in the SAT (loss of orthogonality being a much bigger concern). Furthermore, the SAT represents a fairly small and perhaps even relatively benign part of the story of David Coleman's education reform initiatives. Nonetheless, this one shouldn't be that difficult to get right, particularly for a publication with the reputation of the New York Times.

Of course, given that this is the second recent high-profile piece from the paper to take an anti-SAT slant, it's possible certain claims weren't vetted as well as others.

Wednesday, March 26, 2014

The SAT and the penalty for NOT guessing

Last week we had a post on why David Coleman's announcement that the SAT would now feature more "real world" problems was bad news, probably leading to worse questions and almost certainly hurting the test's orthogonality with respect to GPA and other transcript-based variables. Now let's take a at the elimination of the so-called penalty for guessing.

The SAT never had a penalty for guessing, not in the sense that guessing lowed your expected score. What the SAT did have was a correction for guessing. On a multiple-choice test without the correction (which is to say, pretty much all tests except the SAT), blindly guessing on the questions you didn't get a chance to look at will tend to raise your score. Let's say, for example, two students took a five-option test where they knew the answers to the first fifty questions and had no clue what the second fifty were asking (assume they were in Sanskrit). If Student 1 left the Sanskrit questions blank, he or she would get fifty point on the test. If Student 2 answered 'B' to all the Sanskrit questions, he or she would probably get around sixty points.

From an analytic standpoint, that's a big concern. We want to rank the students based on their knowledge of the material but here we have two students with the same mastery of the material but with a ten-point difference in scores. Worse yet, let's say we have a third student who knows a bit of Sanskrit and manages to answer five of those questions, leaving the rest blank thus making fifty-five points. Student 3 knows the material better than Student 2 but Student 2 makes a higher score. That's pretty much the worst possible case scenario for a test.

Now let's say that we subtracted a fraction of a point for each wrong answer -- 1/4 in this case, 1/(number of options - 1) in general -- but not for a blank. Now Student 1 and Student 2 both have fifty points while Student 3 still has fifty-five. The lark's on the wing, the snail's on the thorn, the statistician has rank/ordered the population and all's right with the world.

[Note that these scales are set to balance out for blind guessing. Students making informed guesses ("I know it can't be 'E'") will still come out ahead of those leaving a question blank. This too is as it should be.]

You can't really say that Student 2 has been penalized for guessing since the outcome for guessing is, on average, the same as the outcome for not guessing. It would be more accurate to say that 1 and 3 were originally penalized for NOT guessing.

Compared to some of the other issues we've discussed regarding the SAT, this one is fairly small, but it does illustrate a couple of important points about the test. First, the SAT is a carefully designed tests and second, some of the recent changes aren't nearly so well thought out.

Why I am optimistic about 538

As people may or may not know, Nate Silver has launched an independent website. Some of the people whom I respect the most on the internet (Noah Smith, Paul Krugman, Andrew Gelman) have pointed out some of the teething problems, where the inclusion of either more data or more model information in the article would have been helpful.

In essence, I think that the website is trying to balance a number of things at the same time:

Use of predictive statistical models
Accessible journalism
Thought provoking/contrarian views
A diverse body of topics

All of these elements can be important, but there can be a steep learning curve as to where the value is to the news consumer. For example, Andrew Gelman points out in the sports column (as I best understand it -- I know nothing about sports and I am going entirely on the model comments) that he is having trouble figuring out the underlying model which makes interpretation more complicated. In the comments, there was a request for the correlation matrix, which is deadly reasonable in the statistics field but might not appeal to the median reader.

So why am I optimistic? Because Nate Silver has tended to be very data driven in his endeavours. I have a strong prior expectation that the initial offerings are, at least partially, a test to see where he can add value relative to other media services (both on and off of the web). Under these conditions, a good empirical tester would deliberating try out approaches and opinions that will likely fail. Because that is the only way to get actual data on what works and to find unexploited niches.

If people constantly want more statistics and articles with well described models (or links to well described models) do well then I bet we will see a lot more of them. Or at least I hope so.

So I am going to wait for about 90 days and then see what the site looks like. I could be wrong about this approach -- but I am willing to put my opinion out there and see if the data support it going forward.