West Coast Stat Views (on Observational Epidemiology and more)

Friday, January 18, 2013

Just another (incredibly bizarre) data point

I dislike posts and op-eds that use the latest big news story as an excuse to mount a favorite hobby horse, but I can't pass this one up.

One of the recurring themes here has been the decline in journalistic standards, particularly regarding accuracy and fact-checking. This brings us to the incredibly strange case of Manti Te'o and the imaginary dead girlfriend. It was one of the year's most widely covered stories. Everybody from Sports Illustrated to CBS to the New York Times ("He has personified hope after more than a decade of mediocrity. He has lived the university’s core values at a place where that matters, said Athletic Director Jack Swarbrick.") had carried moving accounts of Te'o's story, but it wasn't until a few days ago that the sports site Deadspin actually dug into the details.

Notre Dame's Manti Te'o, the stories said, played this season under a terrible burden. A Mormon linebacker who led his Catholic school's football program back to glory, Te'o was whipsawed between personal tragedies along the way. In the span of six hours in September, as Sports Illustrated told it, Te'o learned first of the death of his grandmother, Annette Santiago, and then of the death of his girlfriend, Lennay Kekua.

Kekua, 22 years old, had been in a serious car accident in California, and then had been diagnosed with leukemia. SI's Pete Thamel described how Te'o would phone her in her hospital room and stay on the line with her as he slept through the night. "Her relatives told him that at her lowest points, as she fought to emerge from a coma, her breathing rate would increase at the sound of his voice," Thamel wrote.

Upon receiving the news of the two deaths, Te'o went out and led the Fighting Irish to a 20-3 upset of Michigan State, racking up 12 tackles. It was heartbreaking and inspirational. Te'o would appear on ESPN's College GameDay to talk about the letters Kekua had written him during her illness. He would send a heartfelt letter to the parents of a sick child, discussing his experience with disease and grief. The South Bend Tribune wrote an article describing the young couple's fairytale meeting—she, a Stanford student; he, a Notre Dame star—after a football game outside Palo Alto.

Did you enjoy the uplifiting story, the tale of a man who responded to adversity by becoming one of the top players of the game? If so, stop reading. Manti Te'o did lose his grandmother this past fall. Annette Santiago died on Sept. 11, 2012, at the age of 72, according to Social Security Administration records in Nexis. But there is no SSA record there of the death of Lennay Marie Kekua, that day or any other. Her passing, recounted so many times in the national media, produces no obituary or funeral announcement in Nexis, and no mention in the Stanford student newspaper.

Nor is there any report of a severe auto accident involving a Lennay Kekua. Background checks turn up nothing. The Stanford registrar's office has no record that a Lennay Kekua ever enrolled. There is no record of her birth in the news. Outside of a few Twitter and Instagram accounts, there's no online evidence that Lennay Kekua ever existed.

The photographs identified as Kekua—in online tributes and on TV news reports—are pictures from the social-media accounts of a 22-year-old California woman who is not named Lennay Kekua. She is not a Stanford graduate; she has not been in a severe car accident; and she does not have leukemia. And she has never met Manti Te'o.

It has since come out that some journalists had checked some facts and noticed something was wrong but not wrong enough to keep them from running this incredibly dramatic but completely untrue story.

For a reporter looking for a touching human interest story, this was too good to be true and many of the nation's biggest and (sadly) best news organizations ran it without bothering to determine if it was true. That's troubling.

But unfortunately no longer all that surprising.

Thursday, January 17, 2013

Ray Fisman has a new book you might want to check out

Though we've disagreed strongly in the past, Ray Fisman is a smart guy with some insightful things to say about a problem Joseph and I have been thinking a lot about, how interests go out of alignment and how we can engineer big, complex organizations to keep that from happening.

Fisman has a new book out on the subject: “The Org: The Underlying Logic of the Office.” Here's a quote from an enthusiastic review in the New York Times.

This suggests a good rule of thumb to determine when a private company will outperform the public sector: if the task is clear-cut and it’s possible to define concrete goals and reward those who meet them, the private sector will probably do better. “If I can write a perfect contract in which I pay for a concrete observable outcome, can rule out cream-skimming and can ensure the measure is not gamed, there is no reason that the private sector can’t do it better,” Professor Fisman said.

Safety Nets and Canada

Dean Dad has an interesting observation:

I was reminded of that a few days ago, in a discussion with a Canadian colleague. We have similar senses of humor, so we got to talking about The Kids In The Hall, SCTV, and national styles of humor. (For my money, “Brain Candy” is a neglected classic of dark, dark, dark comedy.) She offered the theory that Canada punches above its weight culturally because its social safety net -- health care most conspicuously -- makes it possible for people to take chances on creative careers. As a result, they get Holly Cole, and we’re left with Adam Sandler.

That was then expanded on in the comments

While I disagree with the specific point about Canada punching above its weight culturally (quick name a great Canadian film that's not "Strange Brew"), I do think that a robust safety net does make entrepreneurial risk taking more likely because people can afford to take the risk of starting a business without having to worry about losing health insurance or other benefits.
I used to have a state government job where this dynamic was apparent: the secretaries in the agency were fairly low paid, but had very good benefits. 3/4 of the secretaries in my officer were married to husbands who had their own small contracting or (vaguely) construction related business. They made much more than their wives made, but had no independent health benefits of their own

I think that this is a neglected conversation. The ability to take risks is not just driven by rewards but also by the costs of failure. If you make the rewards extreme and failure punishing then you create incentives for cheating and "doing anything to win".

This effect shows up in a number of areas -- imagine you are a high school teacher diagnosed with a major illness. In the real world, COBRA is unaffordable and unemployment is over eight percent. You are teaching less well due to health issues. One can see a lot of pressure to find a way -- any way -- to keep test scores above the retention threshold.

You can also see this with small businesses. The reforms of bankruptcy law (making it harder to go bankrupt) and the cost of health care for those without insurance makes starting up a small firm really risky. It makes a lot more sense to stay in your sub-optimal office job with the result that you have less innovation and dynamism in the economy.

These effects are just as predictable as free markets are and it can make a lot of sense to invest in ways to pool or mitigate the risk associated with being an innovator.

Wednesday, January 16, 2013

Stop me if you've heard this one before

Michael Shermer has an interesting post over at Scientific American, but there's one paragraph I'm not too happy about.

Cognitive dissonance may also be at work in the compartmentalization of beliefs. In the 2010 article “When in Doubt, Shout!” in Psychological Science, Northwestern University researchers David Gal and Derek Rucker found that when subjects' closely held beliefs were shaken, they “engaged in more advocacy of their beliefs ... than did people whose confidence was not undermined.” Further, they concluded that enthusiastic evangelists of a belief may in fact be “boiling over with doubt,” and thus their persistent proselytizing may be a signal that the belief warrants skepticism.

I'm way out of my field on this one. My knowledge of psychology is limited to an undergrad intro course and a copy of Cialdini's Influence, but I'm pretty sure that researchers have been finding this sort of thing since the mid-Fifties when Leon Festinger wrote about doomsday cultists proselytizing more the day after the world failed to end and confirming this and related aspects of cognitive dissonance ever since in a large group of studies. (Discover did a better job with the context, though that is comparing a paragraph to a whole post.)

I don't mean to suggest Gal and Rucker were not doing important, original work. I'm sure they were. My beef here is with Shermer and the lack of context. A reader coming on this paragraph cold would be left with the impression that this was a new idea rather than the latest brick in a decades long wall.

I can understand the appeal of the cutting edge. The new stuff is sexier. It gets people's attention. The trouble is, those cutting edge studies often collapse under scrutiny. Some can't be replicated. Others prove to be not that important.

Confirmation, on the other hand, is not sexy. It doesn't drive traffic. It's harder to fit into a paragraph. In a way, though, it's more interesting because it has a high likelihood of being true and fills in the gaps in big, important questions. The interaction between the ideas is usually the interesting part.

Of course, this may be awfully picky given that we're talking about a single paragraph, but this is a recurring issue. New developments are frequently reported in a vacuum, and the result is often a badly misled reader. In these situations a few lines of context can go a long way.

Misleading graphics

This infographic has been coming under criticism from the usual suspects. The group that got hit the worst was the married couple with two children making $650,000 per year. According to the census bureau, in 2009 the median income for a household was $49,777. So there is a tax increase of 3.3% on couples making 13 times the median US household income.

Needless to say, this really doesn't reflect the likely impact of these tax law changes on the typical American (it is just to easy to ask hard questions about the representativeness of a single mother who makes $260K/year).

[note -- label typo corrected]

More thoughts on Education

A couple more thoughts on the whole education reform movement, both hoisted from comments.

First, there was a comment by Stuart Buck:

As a matter of basic social science, what should concern one is not the absolute level of a state's performance now but the counterfactual (what would its performance otherwise be).

This is absolutely correct. However, what we are really missing is a time frame for improvement as well as an expected magnitude of improvement. So if we look at 1999, the top rated state in StudentsFirst (La) had a Grade 8 reading score of 252 (compared to an average of 261). In 2011 the score was 255 (improved 3 points) versus a national average of 264 (which also improved three points). Between 2009 and 2011 both La and the national average also improved by the same amount. DC is even more interesting. In 2007 (when Rhee began her reforms) Grade 8 reading was 241 (versus 261 nationwide). In 2011 it was 242 (versus 264 nationwide).

So we would have to explain the unexpected drop in performance we expected in Louisiana that simply did not happen or the reason by DC lagged even further 4 years into a reform program. Even longitudinally, first in the nation seems odd given a lower baseline (thus more room for improvement due to lower hanging fruit and maybe even regression to the mean) and an absolutely dead standard increase.

Second, at lawyers, guns and money a commenter said:

That being said, using it for evaluative purposes is misguided and unfair to educators. I proctor the test, and I see a large number of students who don’t take the test seriously at all. They just click through to get it over with. Our student population has taken the test in the grips of a horrible flu outbreak. Those kids who were actually in school at the time were sick, getting sick, or struggling to get over being sick. When you have to spray down the computers with Lysol after every class comes through, you really have to question the validity of the results obtained. Technical difficulties that require restarting the computer and/or test can also have a suppressive effect on students’ scores.

As the Tech coordinator in a school, this seems to be a reasonable position to make such an evaluation. That raises the question of "high stakes for whom"? I am actually a fan of looking at SAT scores. Why? Because not only is the test well respected but the test makers have a financial incentive to make sure the test does what they say it does (so it can continue as a national standard). The students have an incentive to do well on this test because high scores open doors for them. So when a teacher is evaluated on SAT performance, I am pretty comfortable saying that the other actors are likely to have aligned incentives on giving an unbiased estimate.

Finally, the thing that really seems to be mixed up in the Rhee report is the difference between efficiency (cost savings) and quality (performance). By analogy, consider military pensions. They exist, in large part, so that we can retain top performers in the armed forces. If anything the defined benefit pensions improves quality by keeping soldier with 15 years of valuable experience in the military. The problem with pensions only arises if the military gets bad at weeding out incompetent performers (which, so far as I can tell, is not currently a major problem). It is good to keep experienced people around while they are still effective but it is expensive. So the empirical question is does it cost more than it is worth?

The same issue arises with the class size metric. I have been in large and small classes with an excellent teacher. I learned a lot more in the small class because the teacher could focus more attention on each student. Is it better to have large classes (like StudentsFirst claims)? Well, only if you have identified top performers and can assure yourself that you are compensating for class quality with teacher quality. This is a hard claim to support. On the other hand, almost no luxury is as expensive as small classes. Notice how universities have reacted to this pressure by putting hundreds of students into a single classroom. So is the cost worth the improvement in quality is a legitimate question.

So the issues here are twofold. One, the data on performance do not seem to map easily onto the counter-intuitive rankings of StudentsFirst. Two, the type of high stakes test that seems to be a key feature of the education reform movement has some work to do in properly aligning incentives.

Tuesday, January 15, 2013

Hat tip to Yglesias

Mark and I have been hard on Matt Yglesias lately. But this post was a really clever idea. He took Kevin Drum's idea that lead was responsible for a rise in violent crime and asked "if this explanation is correct then what else must be true?". Since impulse control is also linked to high school graduation he plotted high school graduation (which has been unexpectely rising) and sees the same pattern as with violent crime.

Now nothing rules out a confounder that affects both of these outcomes. I worry about simple explanations for complex phenomena. But it was a clever idea to try and falsify the hypothesis by looking at things that should be related to violent crime. And, if there was going to be a candidate for such a broad level of toxicity, a substance implicated in brain damage going back to classical Rome isn't a bad choice.

Felix Salmon on personal finance

Felix Salmon has beaten me to the punch here but I do think that this statement needs to be properly understood for what it means:

It surely comforts modern parents who have spent fortunes educating their children to know that these children are spending money on pork belly and not, for instance, cocaine. But what solace can it offer to realize that $300 a week put into an S. & P. 500 Index fund over the past five years would have provided an annual rate of return of 10.34 percent and grown to $100,354 today? Even saving $300 a week at a 6 percent rate of return would have yielded about $91,000, Mark X. Chemtob, a financial adviser at Ameriprise, said, adding that in both cases, the sums would qualify for a down payment on a starter apartment in New York.

So if a person invested for five years, and got a retern of 10.34 percent they would have a lot of money. So have happened 5 years ago (2007)? Here is wikipedia:

The Dow Jones Industrial Average, Nasdaq Composite and S&P 500 all experienced declines of greater than 20% from their peaks in late 2007.

So if you had perfect market timing then you could have invested directly after a crash (as opposed to during it) taking advantage of the recent market crash. Unless, of course, you were the 23 year old in the article who is likely in school and not making $300/week of investable income.

The other side of this coin is that it is very hard to be 23 years old, just graduated from school, making real money for the first time in your life and not enjoy some of it. After all, perpetual deferred gratification is never being able to enjoy the rewards of your career. Nor is it clear that somebody in their first year out of college should be buying a Manhattan apartment (a highly leveraged investment) until they find out if they are going to be successful in New York.

Nor can you drop the cost to zero. I would find it hard to eat in New York city for less then $75/week. Remember, we are taking a city where space is at a premium and everything bought in the city is expensive (including kitchen facilities). So eating 21 meals at about $3 apiece is actually pretty tough, even if you have good skills for cooking from scratch. And, even more interesting, the person in this example is taking on extra work to fund her leisure time (as opposed to, for example, debt).

So I agree -- a very misleading example.

Monday, January 14, 2013

Technological stagnation bandwagon

Looks like this idea may actually be getting fashionable (from the Economist):

So it may come as a surprise that some in Silicon Valley think the place is stagnant, and that the rate of innovation has been slackening for decades. Peter Thiel, a founder of PayPal, an internet payment company, and the first outside investor in Facebook, a social network, says that innovation in America is “somewhere between dire straits and dead”. Engineers in all sorts of areas share similar feelings of disappointment. And a small but growing group of economists reckon the economic impact of the innovations of today may pale in comparison with those of the past.

Now we just have to figure out what to do about it.

Saturday, January 12, 2013

Matt Yglesias -- Defending the indefensible

[I've been off line most of the week so instead of participating in the discussion on the Students First report card, I am, with apologies for the length, putting down my reaction in one big, ugly lump]

At some point in the past year, it became impossible to mount a serious defense of Paul Ryan. There had always been cracks in the facade -- numbers that didn't add up, unlikely claims, extremist quotes -- but most of these could be ignored and those that couldn't were invariably excused that Ryan was sincere, he was a serious budget guy and he was getting us to discuss important policy questions.

Eventually though, the discrepancies started to accumulate, and by the time we got the specifics (or non-specifics) of the Ryan budget and the close scrutiny of the campaign, the standard excuses simply weren't sufficient. This left a large number of journalists with a difficult choice: distance themselves from a politician they had invested great emotional and reputational capital in; or invest still more in increasingly strained defenses. The most memorable example of the first was William Saletan's amusing break-up letter. The most embarrassing example of the second probably comes from James Stewart.

In many ways, Michelle Rhee has occupied a Ryan-like niche in the education. Both started out as camera-friendly media darlings with highly marketable bipartisan appeal and reputations as serious problem-solvers. In both cases there were, from the beginning, troubling details that undercut these reputations but at the time these details never got much traction. As with Ryan and fiscal responsibility, criticizing Rhee was often read as indifference towards the education gap.

But, as they did with Ryan, nagging questions started to accumulate. There were incidents that seemed to show Rhee abusing her authority. There were questions about cheating under her watch. There was increasingly pointed anti-teacher rhetoric. There was the aggressive pursuit of self-advancement. At each stage, more of Rhee's liberal supporters started getting uncomfortable.

For many, such as the New Republic's Seyward Darby, the tipping point came when Rhee partnered with Florida's Rick Scott. Before Scott, Rhee's liberal supporters had taken the position had been that she was tough on teachers, but reluctantly, and only because it was necessary in order to improve education. With Scott these outcomes were reversed. He was willing to pursue a "reform" agenda if it hurt a faction he saw as hostile.

The alliance with Scott and similar figures alienated some supporters on the left, but it still allowed the possibility that Rhee was acting in good faith. With the release of the Students First state report card, even that is gone. There is not even a pretense that this is about anything other than promoting Rhee and her allies. The Washington Post had a good summary.

In Rhee’s grading system, the D.C. school system that is implementing the reforms she instituted got a higher grade than the states of Maryland and Virginia — which consistently are at or near the top of lists of high-performing states — and Virginia. Maryland got a D-plus. Virginia got a D-minus. The District? The urban system with the highest achievement gap in the country? It got a C-plus.

The states that got the highest score handed out — a B minus — were Florida and Louisiana. No surprise there.

Florida’s reform efforts were spearheaded more than a decade ago by then-Gov. Jeb Bush, who was the national leader in these kinds of reforms. The school accountability system that Bush set up, the Florida Comprehensive Assessment Test, is scandal-ridden, but he still travels the country promoting his test-based reform model.

Louisiana is the state where Republican Gov. Bobby Jindal instituted a statewide voucher program that gave public money to scores of Christian schools that teach Young Earth Creationism, the belief that the Earth and the universe were created by God no more than 10,000 years ago. Kids learn that dinosaurs co-existed with humans. That’s the state that got Rhee’s top grade.

A quick digression on some good indicators of when a metric has been cooked:

1. It reaches an unexpected, even unbelievable conclusion in the favor of the person designing the metric;

2. It leaves out important variables;

3. And leaves in other variables only tangentially related to the central question;

4. It uses an odd, difficult to justify weighting scheme, making certain factors dominant for no apparent reason.

This report is not only cooked till it's charred; it also flies in the face of Rhee's own rhetoric on tests and accountability. It is, in a word, indefensible, but just as Ryan had James Stewart, Rhee has Matt Yglesias.

Michelle Rhee is a controversial figure, and anything her advocacy organization, Students First, does is going to attract a lot of derision. But having had the chance to play around with their "report card" on state policy, I think there's a lot to like here.

The best thing about it, really, is just that they did it. Importantly it's a report card assessing the state of education policy in different places, not outcomes. ... Only two states score above C+ on their ratings—Louisiana and Florida—and student learning outcomes in those states are far from the best in the nation. If Louisiana starts making a lot of progress in closing the gap with, say, Maryland, then that'll be powerful evidence for the Students First approach. But if it doesn't, then you get the reverse.
...
In policy terms, the most interesting thing about the Students First report is probably its treatment of charter schools. ... The Students First perspective more wisely dings states that make it too hard to open charters but also dings states (like, say, Arizona) that do much too little to hold charter schools accountable for performance.

You should probably read the whole thing (it's less than 300 words) but this gets at the gist. The entire piece is pretty much just a pander and two short, flawed arguments.

Let's start with the "powerful evidence" argument. Yglesias here treats the report card as not really being a measure of school quality (he doesn't have much choice since the score is actually inversely related to school quality), rather a measure of where schools fall on a policy spectrum so we can basically treat their score as an independent variable when evaluating these policies by comparing score with improvement.

It's worth noting that Rhee's site introduces the report card as follows "We hope this helps reveal more about what states are doing to improve the nation’s public education system so that it serves all students well and puts each and every one of them on a path toward success." Here and elsewhere, Rhee clearly means that the states with better scores are doing a better job. This doesn't align very well with Yglesias's argument.

More to the point though, the argument doesn't hold up even in isolation. The idea of providing a useful indicator would only make sense if we scored the schools at the beginning of implementation of the policies. Instead we have a collection of initiatives with varying start dates, most a few years old, some dating back to Jeb Bush. Perhaps as bad, Yglesias leaves the time frame open (always a bad idea) in a situation where a shake-up in the achievement rankings for any reason will tend to favor states at the top of the Students First list. (Louisiana can't really go that far down.) Any kind of causal inference drawn from a change in one of these top scored states would be meaningless.

The only other specific Yglesias can come up with is that the report supposedly requires schools to hold charter schools accountable for performance. Putting aside the obvious "accountable for performance" irony, this claim is a bit difficult to accept at face value given the related question of holding private institutions that receive state money accountable. Remember, Louisiana is Rhee's top ranked state despite a voucher system notorious for its lack of accountability:

The school willing to accept the most voucher students -- 314 -- is New Living Word in Ruston, which has a top-ranked basketball team but no library. Students spend most of the day watching TVs in bare-bones classrooms. Each lesson consists of an instructional DVD that intersperses Biblical verses with subjects such chemistry or composition.

The Upperroom Bible Church Academy in New Orleans, a bunker-like building with no windows or playground, also has plenty of slots open. It seeks to bring in 214 voucher students, worth up to $1.8 million in state funding.

At Eternity Christian Academy in Westlake, pastor-turned-principal Marie Carrier hopes to secure extra space to enroll 135 voucher students, though she now has room for just a few dozen. Her first- through eighth-grade students sit in cubicles for much of the day and move at their own pace through Christian workbooks, such as a beginning science text that explains "what God made" on each of the six days of creation. They are not exposed to the theory of evolution.

"We try to stay away from all those things that might confuse our children," Carrier said.

Other schools approved for state-funded vouchers use social studies texts warning that liberals threaten global prosperity; Bible-based math books that don't cover modern concepts such as set theory; and biology texts built around refuting evolution.

That's it. Out of the "lot to like" here, Yglesias can only come with a flawed we-can-see-how-we're-doing argument and a highly suspect claim about accountability. Less than three hundred words total and he's clearly scraping bottom to put those together.

This whole affair is a case study in how bad ideas lodge themselves in the discourse through journalistic convergence and superficiality, the fetishizing of balance, and the inability of otherwise smart, responsible people to admit (perhaps even to themselves) that they've been proven wrong.

update: link added.

Thursday, January 10, 2013

Sometimes I need to read more clearly

I had noted that the presence of defined benefit versus defined contribution pension plans in the evaluation of state education systems. This was obviously hard to fathom as the way you handle retired teachers seems to have limited predictive ability for the quality of education. I had not realized it was an anchor category:

Amazingly, the methodology being used by Rhee’s grifters gives states a “4″ (the highest score) if they have defined contribution pensions and a “0″ if they have defined benefit pensions. In other words, states get higher rankings for their education systems if they make their pension benefits less attractive! Even more amazingly, pension “reform” is an “anchor” category, meaning it gets three times the weight of some of the other categories that might actually have a clear positive relationship with improving a state’s educational system.

So even if you thought there was some small effect here (older teachers hanging on for a couple of extra years to improve their pensions who could easily have found a job elsewhere?), it's hard to imagine that this is a key metric that should be given special weight.

This sort of revelation really does point out the problem with models: bad data in, bad results out.

Causal Inference

Andrew Gelman comments on an article linking genetic diversity (both high and low) with less economic performance than countries with middling levels of diversity. His take-away is quite good:

High-profile social science research aims for proof, not for understanding—and that’s a problem. The incentives favor bold thinking and innovative analysis, and that part is great. But the incentives also favor silly causal claims. In many social sciences, it’s not enough to notice an interesting pattern and explore it (as we did in our Red State Blue State book). Instead, you’re supposed to make a strong causal claim even in a context where it makes little sense.

But I also think it omits one piece that is crucial for causal claims: what does a counterfactual look like? This happens a lot with complex phenomenon in both medicine and social science. Just look at the question of whether or not to adjust for variables like blood pressure and cholesterol when estimating the effect of obesity on mortality:

It's possible that most of the thin people who die are meth addicts or have cancer, but even a study which threw out the folks who died within three years of entry into the study found that once you accounted for physical activity*, "underweight" BMIs were correlated with excess mortality risk, while "overweight" BMIs were not. And arguing that the study fails to control for things like blood pressure, blood sugar, and cholesterol seems like fairly weak sauce; those are the very mechanisms by which obesity is supposed to kill us.

So what would it mean to make a person thinner and not influence the mediating factors through which the disease operates? It would be a thin person with a lot higher risk of mortality, I suspect. It's the same example as imagining an antihypertensive medication conditioned on blood pressure -- one would suspect that the causal effect of the drug on the participant would be different if it failed in its primary function.

In the same sense, the question of how to change genetic diversity without influencing a lot of other variables is a tricky one. What would it mean for a country whose genetic composition was unrelated to migration to change their level of diversity without changing other factors? What is the mechanism by which we think this operates? Mechanisms are not very important for randomized trials because the design eliminates confounding. But for a non-randomized study, this is a very important piece.

And if we argue that this is just a proxy variables (which seems to be the route that Andrew is taking in his discussion) then the hard causal claims are unecessary. Even worse, they may well obscure factors on which we could imagine basing a strong counter-factual. Exploring data like this is an extremely interesting exercise but I agree that I wish we could admit when we see an interesting pattern that we may not know why this pattern exists.

Professors and stress

This reply to the classic Forbes article by Susan Adams is worth reading. My favorite piece:

Write a grant application, get three anonymous reviewer critiques. Submit research results for publication in a peer-reviewed journal, get anonymous reviewer critiques. Submit your tenure portfolio or post-tenure portfolio to a college promotion and tenure committee, get anonymous reviews. While one may know the general composition of grant review and promotion and tenure committees, you don’t know precisely who is gunning for you. Anonymity is sometimes useful but more often allows petty vendettas to occur that are independent of the work at hand.

It is amazing how true this can be and how hard it is to try and modify your approach based on feedback when the next set of anoymous reviewers could be completely different.

Tuesday, January 8, 2013

Metrics on education

Matt Yglesias has a post up about the StudentsFirst report claiming that outcomes are complicated to measure. While this point is, in the abstract, true, it is informative to see what the highest ranked state (Louisiana) does in the grade 8 reading tests results that he shows. He gives a number of groups: Everyone, Africian American, Latino, Low income, and White. Louisiana is in the below average group for all groups except Latinos (where they are average). It is also worth noting that Latinos make up about 3% of Louisiana's population, whereas African Americans make us around 34% of the population. So they get average results only in a very tiny minority population.

Now there are lot's of reasons why a school system might be doing the best that they can and student results are divorced from a lot of complex social phenomenon. However, when the top ranked states is below average on most student outcomes and above average for no populations that should be concerning.

Now maybe the reforms have been too recent to have results. But if reforms have very long lag times then we have another problem -- how do we properly evaluate the quality of reforms if it takes a decade to be seen in the test results?

UPDATE: See here for the actual correlation coefficients. Consider:

Looking more rigorously at the results, the correlation coefficient on the rankings in the StudentsFirst report card with state rankings on reading scores is -0.20. (The correlation coefficient is a measure of the similarity of two sets of numbers, ranging from -1.0, completely dissimilar, to +1.0, perfect similarity.) That’s not a large number, but the negative sign means that the correlation is in the wrong direction: the higher the StudentsFirst score, the lower the NAEP reading score. The correlation on math is even worse, -0.25.

It's not a good sign when the outcomes data is in the wrong direction.

EDIT 2: Missing link inserted

Charter school tricks: an ongoing saga

This is remarkable:

Boston’s Commonwealth charter schools have significantly weak “promoting power,” that is, the number of seniors is routinely below 60 percent of the freshmen enrolled four years earlier. looking at it another way, for every five freshmen enrolled in Boston’s charter high schools in the fall of 2008 there were only two seniors: Senior enrollment was 42 percent of freshmen enrollment. in contrast, for every five freshmen enrolled in the Boston Public Schools that fall there were four seniors: Senior enrollment was 81 percent of freshmen enrollment.

High graduation rates seem to be misleading if the weaker students are simply being pushed out and back into the public system (or even worse not in the system at all). An honest conversation about choice requires that we be aware of the ways that private institutions are different than public ones. I know people who have had their kids kicked out of a daycare because it wasn't working out and because a private institution can do what it wants with customers. The ability to remove disruptive students is certainly a nice benefit, but does the likely arms race really work out for the children involved?