West Coast Stat Views (on Observational Epidemiology and more)

Tuesday, January 22, 2013

More reflections on a study (EDITED)

Okay, before I get to the meat of this post, this quote by Andrew Gelman is dynamite:

. . . I’m generally suspicious of arguments in which the rebound is bigger than the main effect.

How many "counter-intuitive" studies would survive this kind of skepticism. Not that a rebound effect can't be larger, but like many unlikely things it requires a higher level of proof.

The context is an education study which suggests that the more parents pay for education the lower the grades of the student will be. The authors apparently tried to control for a lot of possible confounders (like SAT scores) but the whole process ends up looking like "what not to do in regression analysis".

There is an intermediate variable (problem), a restriction of range problem (extrapolating parental support out to values that exceed annual income), and an issue with differential drop-out that does not seem to be addressed. All of these points are present in Andrew's nice write up.

What I want to focus on is the sharp counter-factual. I am not always a fan of counter-factual reasoning, but I think that it would provide a ton of clarity in this case. The real claim is that if you decreased exposure X (parental support) then you would increase outcome Y (GPA). The direct causal model would suggest that the fastest way to improve student grades would be to make your contributions zero. But, the last time I looked, Pell grants require a non-zero parental contribution in most cases (it is a little hard to tell precisely what the thresholds are but they definitely are not zero for most students). So clearly this is a floor on parental contributions (and if it was the only source of contributions the effect would become the richer the parents the worse the grades of the student).

So maybe, to have a realistic counter-factual, the exposure should be dollars of support above the minimum expected contribution?

So, really what we have to be talking about the the effect of a marginal dollar separate from the (non-linear) scale of what the parents are required to pay. But, even there, the direction is unclear. Imagine that your not especially inspired child gets admission to Stanford but they are struggling with the material. Do you insist, on principle, that they get a job or do you pay more so that they have a better chance to be a "C average" Stanford graduate (which is much better than a Stanford drop-out). So the causal direction is actually unclear.

But if the idea is that giving more resources to students decreases performance then there are a lot of experiments we could try. For example, we could decrease wages (for everyone including upper managment) and see if performance goes up. Or we could randomize students to improved levels of support. Better yet, we could look at experiments that have already been done:

We examine the impacts of a private need-based college financial aid program distributing grants at random among first-year Pell Grant recipients at thirteen public Wisconsin universities. The Wisconsin Scholars Grant of $3,500 per year required full-time attendance. Estimates based on four cohorts of students suggest that offering the grant increased completion of a full-time credit load and rates of re-enrollment for a second year of college. An increase of $1,000 in total financial aid received during a student’s first year of college was associated with a 2.8 to 4.1 percentage point increase in rates of enrollment for the second year.

So not only is the main effect in the opposite direction (at least in terms of retention) but it has precisely the impact on a GPA analysis that Andrew expects: students are more likely to leave with lower levels of support. Do we think that leaving school is completely independent of performance (that there is no GPA difference between the drop-outs and those who persist)? Or is parental support different, in some magic way, than government grant support? People are more careful stewards of government money than they are of money from their close community (and think about what this would mean for charity versus government welfare programs, if true)?

I agree that the current form of this study is impossible to interpret.

[EDIT: Talking with Mark, it is clear that I was unclear on one point above. The experiments show money from a specific source (i.e. government funding) go in a specific direction but don't at all address whether money from parents has a similar causal effect (Mark is promising to talk about this in a post himself). The issues of selection, intermediate variables, and experimental evidence from other sources are all important, but without re-analyzing the data it is impossible to prove the directionality of the bias. As an epidemiologist I am trained to speculate on bias direction/strength but I recognize that is all I am doing. ]

One more cognitive dissonance post

Andrew Gelman has a good, skeptical response to some questionable claims from Herbalife, but there's one point where we're in disagreement, not so much as to the conclusion as to the reasoning behind it.

Gelman says:

Amusingly, one of Herbalife’s points is “Fact: Majority of Former Distributors Would Recommend Herbalife to Friends and Family.” But that’s exactly what you’d expect of a still-active pyramid scheme, no? Existing members want new people below them on the pyramid. I’m not saying this means it is a pyramid scheme, but it doesn’t seem like evidence against the hypothesis!

Perhaps I'm misreading this but I'd assume that former distributors no longer have a direct interest in the company. If this is true, does this mean we can take former distributors as impartial judges? Not by a long shot.

This is where we segue back to a recent thread, cognitive dissonance and the psychology of marketing. Companies like Amway and Herbalife are textbook examples of marketing psych (literally for Amway, in two different chapters, no less). And it's important to remember that this relationship goes both ways. While the psychologists writing these texts study these companies, many of the executive in these companies have read these books and taken these classes and they've thought seriously about how best to apply these principles.

[My background in this field is spotty so I would highly recommend that you pick up a copy of Cialdini's Influence (either regular or textbook version) or some other good text on the subject and make sure I'm getting this right... ]

There have been a number of studies that show that when you convince people to believe something based on one reason, they have a tendency to come up with additional reasons to support that belief and that these reasons do not go away just because the original reason is removed. This effect is even stronger when the belief is stated publicly, particularly to friends and family or in writing (Amway training makes a big deal about getting things in writing).

The former Herbalife distributors are another one of those cases where what happened was exactly what the textbooks said would happen: people who had sold a line of products to friends and family in the past now tend to hold the reassuring belief that those products were good. This doesn't prove that they weren't good and it certainly doesn't say anything one way or the other about Herbalife being a pyramid; it simply serves as another reminder that things often happen the way your professor said they would.

Sunday, January 20, 2013

Alexandria Word Searches -- a new kind of puzzle for the weekend

I've got a post on a different kind of word search puzzle at You Do the Math. Mainly pitched toward teachers, but hopefully still fun for the general audience. Here's a Shakespeare-themed example with, at last count, eighteen Bard-friendly answers.

There are more puzzles (in larger formats) at the original post.

Saturday, January 19, 2013

"Edward Tufte Wants You to See Better"

I was looking for an interview for a post on the way we cover health issues, when I came across this interview with Edward Tufte. I haven't had a chance to check it out yet so this isn't really a recommendation but, given the depth of my to do list, I decided it would be better to pass it along now.

Friday, January 18, 2013

Just another (incredibly bizarre) data point

I dislike posts and op-eds that use the latest big news story as an excuse to mount a favorite hobby horse, but I can't pass this one up.

One of the recurring themes here has been the decline in journalistic standards, particularly regarding accuracy and fact-checking. This brings us to the incredibly strange case of Manti Te'o and the imaginary dead girlfriend. It was one of the year's most widely covered stories. Everybody from Sports Illustrated to CBS to the New York Times ("He has personified hope after more than a decade of mediocrity. He has lived the university’s core values at a place where that matters, said Athletic Director Jack Swarbrick.") had carried moving accounts of Te'o's story, but it wasn't until a few days ago that the sports site Deadspin actually dug into the details.

Notre Dame's Manti Te'o, the stories said, played this season under a terrible burden. A Mormon linebacker who led his Catholic school's football program back to glory, Te'o was whipsawed between personal tragedies along the way. In the span of six hours in September, as Sports Illustrated told it, Te'o learned first of the death of his grandmother, Annette Santiago, and then of the death of his girlfriend, Lennay Kekua.

Kekua, 22 years old, had been in a serious car accident in California, and then had been diagnosed with leukemia. SI's Pete Thamel described how Te'o would phone her in her hospital room and stay on the line with her as he slept through the night. "Her relatives told him that at her lowest points, as she fought to emerge from a coma, her breathing rate would increase at the sound of his voice," Thamel wrote.

Upon receiving the news of the two deaths, Te'o went out and led the Fighting Irish to a 20-3 upset of Michigan State, racking up 12 tackles. It was heartbreaking and inspirational. Te'o would appear on ESPN's College GameDay to talk about the letters Kekua had written him during her illness. He would send a heartfelt letter to the parents of a sick child, discussing his experience with disease and grief. The South Bend Tribune wrote an article describing the young couple's fairytale meeting—she, a Stanford student; he, a Notre Dame star—after a football game outside Palo Alto.

Did you enjoy the uplifiting story, the tale of a man who responded to adversity by becoming one of the top players of the game? If so, stop reading. Manti Te'o did lose his grandmother this past fall. Annette Santiago died on Sept. 11, 2012, at the age of 72, according to Social Security Administration records in Nexis. But there is no SSA record there of the death of Lennay Marie Kekua, that day or any other. Her passing, recounted so many times in the national media, produces no obituary or funeral announcement in Nexis, and no mention in the Stanford student newspaper.

Nor is there any report of a severe auto accident involving a Lennay Kekua. Background checks turn up nothing. The Stanford registrar's office has no record that a Lennay Kekua ever enrolled. There is no record of her birth in the news. Outside of a few Twitter and Instagram accounts, there's no online evidence that Lennay Kekua ever existed.

The photographs identified as Kekua—in online tributes and on TV news reports—are pictures from the social-media accounts of a 22-year-old California woman who is not named Lennay Kekua. She is not a Stanford graduate; she has not been in a severe car accident; and she does not have leukemia. And she has never met Manti Te'o.

It has since come out that some journalists had checked some facts and noticed something was wrong but not wrong enough to keep them from running this incredibly dramatic but completely untrue story.

For a reporter looking for a touching human interest story, this was too good to be true and many of the nation's biggest and (sadly) best news organizations ran it without bothering to determine if it was true. That's troubling.

But unfortunately no longer all that surprising.

Thursday, January 17, 2013

Ray Fisman has a new book you might want to check out

Though we've disagreed strongly in the past, Ray Fisman is a smart guy with some insightful things to say about a problem Joseph and I have been thinking a lot about, how interests go out of alignment and how we can engineer big, complex organizations to keep that from happening.

Fisman has a new book out on the subject: “The Org: The Underlying Logic of the Office.” Here's a quote from an enthusiastic review in the New York Times.

This suggests a good rule of thumb to determine when a private company will outperform the public sector: if the task is clear-cut and it’s possible to define concrete goals and reward those who meet them, the private sector will probably do better. “If I can write a perfect contract in which I pay for a concrete observable outcome, can rule out cream-skimming and can ensure the measure is not gamed, there is no reason that the private sector can’t do it better,” Professor Fisman said.

Safety Nets and Canada

Dean Dad has an interesting observation:

I was reminded of that a few days ago, in a discussion with a Canadian colleague. We have similar senses of humor, so we got to talking about The Kids In The Hall, SCTV, and national styles of humor. (For my money, “Brain Candy” is a neglected classic of dark, dark, dark comedy.) She offered the theory that Canada punches above its weight culturally because its social safety net -- health care most conspicuously -- makes it possible for people to take chances on creative careers. As a result, they get Holly Cole, and we’re left with Adam Sandler.

That was then expanded on in the comments

While I disagree with the specific point about Canada punching above its weight culturally (quick name a great Canadian film that's not "Strange Brew"), I do think that a robust safety net does make entrepreneurial risk taking more likely because people can afford to take the risk of starting a business without having to worry about losing health insurance or other benefits.
I used to have a state government job where this dynamic was apparent: the secretaries in the agency were fairly low paid, but had very good benefits. 3/4 of the secretaries in my officer were married to husbands who had their own small contracting or (vaguely) construction related business. They made much more than their wives made, but had no independent health benefits of their own

I think that this is a neglected conversation. The ability to take risks is not just driven by rewards but also by the costs of failure. If you make the rewards extreme and failure punishing then you create incentives for cheating and "doing anything to win".

This effect shows up in a number of areas -- imagine you are a high school teacher diagnosed with a major illness. In the real world, COBRA is unaffordable and unemployment is over eight percent. You are teaching less well due to health issues. One can see a lot of pressure to find a way -- any way -- to keep test scores above the retention threshold.

You can also see this with small businesses. The reforms of bankruptcy law (making it harder to go bankrupt) and the cost of health care for those without insurance makes starting up a small firm really risky. It makes a lot more sense to stay in your sub-optimal office job with the result that you have less innovation and dynamism in the economy.

These effects are just as predictable as free markets are and it can make a lot of sense to invest in ways to pool or mitigate the risk associated with being an innovator.

Wednesday, January 16, 2013

Stop me if you've heard this one before

Michael Shermer has an interesting post over at Scientific American, but there's one paragraph I'm not too happy about.

Cognitive dissonance may also be at work in the compartmentalization of beliefs. In the 2010 article “When in Doubt, Shout!” in Psychological Science, Northwestern University researchers David Gal and Derek Rucker found that when subjects' closely held beliefs were shaken, they “engaged in more advocacy of their beliefs ... than did people whose confidence was not undermined.” Further, they concluded that enthusiastic evangelists of a belief may in fact be “boiling over with doubt,” and thus their persistent proselytizing may be a signal that the belief warrants skepticism.

I'm way out of my field on this one. My knowledge of psychology is limited to an undergrad intro course and a copy of Cialdini's Influence, but I'm pretty sure that researchers have been finding this sort of thing since the mid-Fifties when Leon Festinger wrote about doomsday cultists proselytizing more the day after the world failed to end and confirming this and related aspects of cognitive dissonance ever since in a large group of studies. (Discover did a better job with the context, though that is comparing a paragraph to a whole post.)

I don't mean to suggest Gal and Rucker were not doing important, original work. I'm sure they were. My beef here is with Shermer and the lack of context. A reader coming on this paragraph cold would be left with the impression that this was a new idea rather than the latest brick in a decades long wall.

I can understand the appeal of the cutting edge. The new stuff is sexier. It gets people's attention. The trouble is, those cutting edge studies often collapse under scrutiny. Some can't be replicated. Others prove to be not that important.

Confirmation, on the other hand, is not sexy. It doesn't drive traffic. It's harder to fit into a paragraph. In a way, though, it's more interesting because it has a high likelihood of being true and fills in the gaps in big, important questions. The interaction between the ideas is usually the interesting part.

Of course, this may be awfully picky given that we're talking about a single paragraph, but this is a recurring issue. New developments are frequently reported in a vacuum, and the result is often a badly misled reader. In these situations a few lines of context can go a long way.

Misleading graphics

This infographic has been coming under criticism from the usual suspects. The group that got hit the worst was the married couple with two children making $650,000 per year. According to the census bureau, in 2009 the median income for a household was $49,777. So there is a tax increase of 3.3% on couples making 13 times the median US household income.

Needless to say, this really doesn't reflect the likely impact of these tax law changes on the typical American (it is just to easy to ask hard questions about the representativeness of a single mother who makes $260K/year).

[note -- label typo corrected]

More thoughts on Education

A couple more thoughts on the whole education reform movement, both hoisted from comments.

First, there was a comment by Stuart Buck:

As a matter of basic social science, what should concern one is not the absolute level of a state's performance now but the counterfactual (what would its performance otherwise be).

This is absolutely correct. However, what we are really missing is a time frame for improvement as well as an expected magnitude of improvement. So if we look at 1999, the top rated state in StudentsFirst (La) had a Grade 8 reading score of 252 (compared to an average of 261). In 2011 the score was 255 (improved 3 points) versus a national average of 264 (which also improved three points). Between 2009 and 2011 both La and the national average also improved by the same amount. DC is even more interesting. In 2007 (when Rhee began her reforms) Grade 8 reading was 241 (versus 261 nationwide). In 2011 it was 242 (versus 264 nationwide).

So we would have to explain the unexpected drop in performance we expected in Louisiana that simply did not happen or the reason by DC lagged even further 4 years into a reform program. Even longitudinally, first in the nation seems odd given a lower baseline (thus more room for improvement due to lower hanging fruit and maybe even regression to the mean) and an absolutely dead standard increase.

Second, at lawyers, guns and money a commenter said:

That being said, using it for evaluative purposes is misguided and unfair to educators. I proctor the test, and I see a large number of students who don’t take the test seriously at all. They just click through to get it over with. Our student population has taken the test in the grips of a horrible flu outbreak. Those kids who were actually in school at the time were sick, getting sick, or struggling to get over being sick. When you have to spray down the computers with Lysol after every class comes through, you really have to question the validity of the results obtained. Technical difficulties that require restarting the computer and/or test can also have a suppressive effect on students’ scores.

As the Tech coordinator in a school, this seems to be a reasonable position to make such an evaluation. That raises the question of "high stakes for whom"? I am actually a fan of looking at SAT scores. Why? Because not only is the test well respected but the test makers have a financial incentive to make sure the test does what they say it does (so it can continue as a national standard). The students have an incentive to do well on this test because high scores open doors for them. So when a teacher is evaluated on SAT performance, I am pretty comfortable saying that the other actors are likely to have aligned incentives on giving an unbiased estimate.

Finally, the thing that really seems to be mixed up in the Rhee report is the difference between efficiency (cost savings) and quality (performance). By analogy, consider military pensions. They exist, in large part, so that we can retain top performers in the armed forces. If anything the defined benefit pensions improves quality by keeping soldier with 15 years of valuable experience in the military. The problem with pensions only arises if the military gets bad at weeding out incompetent performers (which, so far as I can tell, is not currently a major problem). It is good to keep experienced people around while they are still effective but it is expensive. So the empirical question is does it cost more than it is worth?

The same issue arises with the class size metric. I have been in large and small classes with an excellent teacher. I learned a lot more in the small class because the teacher could focus more attention on each student. Is it better to have large classes (like StudentsFirst claims)? Well, only if you have identified top performers and can assure yourself that you are compensating for class quality with teacher quality. This is a hard claim to support. On the other hand, almost no luxury is as expensive as small classes. Notice how universities have reacted to this pressure by putting hundreds of students into a single classroom. So is the cost worth the improvement in quality is a legitimate question.

So the issues here are twofold. One, the data on performance do not seem to map easily onto the counter-intuitive rankings of StudentsFirst. Two, the type of high stakes test that seems to be a key feature of the education reform movement has some work to do in properly aligning incentives.

Tuesday, January 15, 2013

Hat tip to Yglesias

Mark and I have been hard on Matt Yglesias lately. But this post was a really clever idea. He took Kevin Drum's idea that lead was responsible for a rise in violent crime and asked "if this explanation is correct then what else must be true?". Since impulse control is also linked to high school graduation he plotted high school graduation (which has been unexpectely rising) and sees the same pattern as with violent crime.

Now nothing rules out a confounder that affects both of these outcomes. I worry about simple explanations for complex phenomena. But it was a clever idea to try and falsify the hypothesis by looking at things that should be related to violent crime. And, if there was going to be a candidate for such a broad level of toxicity, a substance implicated in brain damage going back to classical Rome isn't a bad choice.

Felix Salmon on personal finance

Felix Salmon has beaten me to the punch here but I do think that this statement needs to be properly understood for what it means:

It surely comforts modern parents who have spent fortunes educating their children to know that these children are spending money on pork belly and not, for instance, cocaine. But what solace can it offer to realize that $300 a week put into an S. & P. 500 Index fund over the past five years would have provided an annual rate of return of 10.34 percent and grown to $100,354 today? Even saving $300 a week at a 6 percent rate of return would have yielded about $91,000, Mark X. Chemtob, a financial adviser at Ameriprise, said, adding that in both cases, the sums would qualify for a down payment on a starter apartment in New York.

So if a person invested for five years, and got a retern of 10.34 percent they would have a lot of money. So have happened 5 years ago (2007)? Here is wikipedia:

The Dow Jones Industrial Average, Nasdaq Composite and S&P 500 all experienced declines of greater than 20% from their peaks in late 2007.

So if you had perfect market timing then you could have invested directly after a crash (as opposed to during it) taking advantage of the recent market crash. Unless, of course, you were the 23 year old in the article who is likely in school and not making $300/week of investable income.

The other side of this coin is that it is very hard to be 23 years old, just graduated from school, making real money for the first time in your life and not enjoy some of it. After all, perpetual deferred gratification is never being able to enjoy the rewards of your career. Nor is it clear that somebody in their first year out of college should be buying a Manhattan apartment (a highly leveraged investment) until they find out if they are going to be successful in New York.

Nor can you drop the cost to zero. I would find it hard to eat in New York city for less then $75/week. Remember, we are taking a city where space is at a premium and everything bought in the city is expensive (including kitchen facilities). So eating 21 meals at about $3 apiece is actually pretty tough, even if you have good skills for cooking from scratch. And, even more interesting, the person in this example is taking on extra work to fund her leisure time (as opposed to, for example, debt).

So I agree -- a very misleading example.

Monday, January 14, 2013

Technological stagnation bandwagon

Looks like this idea may actually be getting fashionable (from the Economist):

So it may come as a surprise that some in Silicon Valley think the place is stagnant, and that the rate of innovation has been slackening for decades. Peter Thiel, a founder of PayPal, an internet payment company, and the first outside investor in Facebook, a social network, says that innovation in America is “somewhere between dire straits and dead”. Engineers in all sorts of areas share similar feelings of disappointment. And a small but growing group of economists reckon the economic impact of the innovations of today may pale in comparison with those of the past.

Now we just have to figure out what to do about it.

Saturday, January 12, 2013

Matt Yglesias -- Defending the indefensible

[I've been off line most of the week so instead of participating in the discussion on the Students First report card, I am, with apologies for the length, putting down my reaction in one big, ugly lump]

At some point in the past year, it became impossible to mount a serious defense of Paul Ryan. There had always been cracks in the facade -- numbers that didn't add up, unlikely claims, extremist quotes -- but most of these could be ignored and those that couldn't were invariably excused that Ryan was sincere, he was a serious budget guy and he was getting us to discuss important policy questions.

Eventually though, the discrepancies started to accumulate, and by the time we got the specifics (or non-specifics) of the Ryan budget and the close scrutiny of the campaign, the standard excuses simply weren't sufficient. This left a large number of journalists with a difficult choice: distance themselves from a politician they had invested great emotional and reputational capital in; or invest still more in increasingly strained defenses. The most memorable example of the first was William Saletan's amusing break-up letter. The most embarrassing example of the second probably comes from James Stewart.

In many ways, Michelle Rhee has occupied a Ryan-like niche in the education. Both started out as camera-friendly media darlings with highly marketable bipartisan appeal and reputations as serious problem-solvers. In both cases there were, from the beginning, troubling details that undercut these reputations but at the time these details never got much traction. As with Ryan and fiscal responsibility, criticizing Rhee was often read as indifference towards the education gap.

But, as they did with Ryan, nagging questions started to accumulate. There were incidents that seemed to show Rhee abusing her authority. There were questions about cheating under her watch. There was increasingly pointed anti-teacher rhetoric. There was the aggressive pursuit of self-advancement. At each stage, more of Rhee's liberal supporters started getting uncomfortable.

For many, such as the New Republic's Seyward Darby, the tipping point came when Rhee partnered with Florida's Rick Scott. Before Scott, Rhee's liberal supporters had taken the position had been that she was tough on teachers, but reluctantly, and only because it was necessary in order to improve education. With Scott these outcomes were reversed. He was willing to pursue a "reform" agenda if it hurt a faction he saw as hostile.

The alliance with Scott and similar figures alienated some supporters on the left, but it still allowed the possibility that Rhee was acting in good faith. With the release of the Students First state report card, even that is gone. There is not even a pretense that this is about anything other than promoting Rhee and her allies. The Washington Post had a good summary.

In Rhee’s grading system, the D.C. school system that is implementing the reforms she instituted got a higher grade than the states of Maryland and Virginia — which consistently are at or near the top of lists of high-performing states — and Virginia. Maryland got a D-plus. Virginia got a D-minus. The District? The urban system with the highest achievement gap in the country? It got a C-plus.

The states that got the highest score handed out — a B minus — were Florida and Louisiana. No surprise there.

Florida’s reform efforts were spearheaded more than a decade ago by then-Gov. Jeb Bush, who was the national leader in these kinds of reforms. The school accountability system that Bush set up, the Florida Comprehensive Assessment Test, is scandal-ridden, but he still travels the country promoting his test-based reform model.

Louisiana is the state where Republican Gov. Bobby Jindal instituted a statewide voucher program that gave public money to scores of Christian schools that teach Young Earth Creationism, the belief that the Earth and the universe were created by God no more than 10,000 years ago. Kids learn that dinosaurs co-existed with humans. That’s the state that got Rhee’s top grade.

A quick digression on some good indicators of when a metric has been cooked:

1. It reaches an unexpected, even unbelievable conclusion in the favor of the person designing the metric;

2. It leaves out important variables;

3. And leaves in other variables only tangentially related to the central question;

4. It uses an odd, difficult to justify weighting scheme, making certain factors dominant for no apparent reason.

This report is not only cooked till it's charred; it also flies in the face of Rhee's own rhetoric on tests and accountability. It is, in a word, indefensible, but just as Ryan had James Stewart, Rhee has Matt Yglesias.

Michelle Rhee is a controversial figure, and anything her advocacy organization, Students First, does is going to attract a lot of derision. But having had the chance to play around with their "report card" on state policy, I think there's a lot to like here.

The best thing about it, really, is just that they did it. Importantly it's a report card assessing the state of education policy in different places, not outcomes. ... Only two states score above C+ on their ratings—Louisiana and Florida—and student learning outcomes in those states are far from the best in the nation. If Louisiana starts making a lot of progress in closing the gap with, say, Maryland, then that'll be powerful evidence for the Students First approach. But if it doesn't, then you get the reverse.
...
In policy terms, the most interesting thing about the Students First report is probably its treatment of charter schools. ... The Students First perspective more wisely dings states that make it too hard to open charters but also dings states (like, say, Arizona) that do much too little to hold charter schools accountable for performance.

You should probably read the whole thing (it's less than 300 words) but this gets at the gist. The entire piece is pretty much just a pander and two short, flawed arguments.

Let's start with the "powerful evidence" argument. Yglesias here treats the report card as not really being a measure of school quality (he doesn't have much choice since the score is actually inversely related to school quality), rather a measure of where schools fall on a policy spectrum so we can basically treat their score as an independent variable when evaluating these policies by comparing score with improvement.

It's worth noting that Rhee's site introduces the report card as follows "We hope this helps reveal more about what states are doing to improve the nation’s public education system so that it serves all students well and puts each and every one of them on a path toward success." Here and elsewhere, Rhee clearly means that the states with better scores are doing a better job. This doesn't align very well with Yglesias's argument.

More to the point though, the argument doesn't hold up even in isolation. The idea of providing a useful indicator would only make sense if we scored the schools at the beginning of implementation of the policies. Instead we have a collection of initiatives with varying start dates, most a few years old, some dating back to Jeb Bush. Perhaps as bad, Yglesias leaves the time frame open (always a bad idea) in a situation where a shake-up in the achievement rankings for any reason will tend to favor states at the top of the Students First list. (Louisiana can't really go that far down.) Any kind of causal inference drawn from a change in one of these top scored states would be meaningless.

The only other specific Yglesias can come up with is that the report supposedly requires schools to hold charter schools accountable for performance. Putting aside the obvious "accountable for performance" irony, this claim is a bit difficult to accept at face value given the related question of holding private institutions that receive state money accountable. Remember, Louisiana is Rhee's top ranked state despite a voucher system notorious for its lack of accountability:

The school willing to accept the most voucher students -- 314 -- is New Living Word in Ruston, which has a top-ranked basketball team but no library. Students spend most of the day watching TVs in bare-bones classrooms. Each lesson consists of an instructional DVD that intersperses Biblical verses with subjects such chemistry or composition.

The Upperroom Bible Church Academy in New Orleans, a bunker-like building with no windows or playground, also has plenty of slots open. It seeks to bring in 214 voucher students, worth up to $1.8 million in state funding.

At Eternity Christian Academy in Westlake, pastor-turned-principal Marie Carrier hopes to secure extra space to enroll 135 voucher students, though she now has room for just a few dozen. Her first- through eighth-grade students sit in cubicles for much of the day and move at their own pace through Christian workbooks, such as a beginning science text that explains "what God made" on each of the six days of creation. They are not exposed to the theory of evolution.

"We try to stay away from all those things that might confuse our children," Carrier said.

Other schools approved for state-funded vouchers use social studies texts warning that liberals threaten global prosperity; Bible-based math books that don't cover modern concepts such as set theory; and biology texts built around refuting evolution.

That's it. Out of the "lot to like" here, Yglesias can only come with a flawed we-can-see-how-we're-doing argument and a highly suspect claim about accountability. Less than three hundred words total and he's clearly scraping bottom to put those together.

This whole affair is a case study in how bad ideas lodge themselves in the discourse through journalistic convergence and superficiality, the fetishizing of balance, and the inability of otherwise smart, responsible people to admit (perhaps even to themselves) that they've been proven wrong.

update: link added.

Thursday, January 10, 2013

Sometimes I need to read more clearly

I had noted that the presence of defined benefit versus defined contribution pension plans in the evaluation of state education systems. This was obviously hard to fathom as the way you handle retired teachers seems to have limited predictive ability for the quality of education. I had not realized it was an anchor category:

Amazingly, the methodology being used by Rhee’s grifters gives states a “4″ (the highest score) if they have defined contribution pensions and a “0″ if they have defined benefit pensions. In other words, states get higher rankings for their education systems if they make their pension benefits less attractive! Even more amazingly, pension “reform” is an “anchor” category, meaning it gets three times the weight of some of the other categories that might actually have a clear positive relationship with improving a state’s educational system.

So even if you thought there was some small effect here (older teachers hanging on for a couple of extra years to improve their pensions who could easily have found a job elsewhere?), it's hard to imagine that this is a key metric that should be given special weight.

This sort of revelation really does point out the problem with models: bad data in, bad results out.