West Coast Stat Views (on Observational Epidemiology and more)

Thursday, September 9, 2010

Changing statistical languages

Even when a current programming language has drawbacks, it can be hard to change to a more optimal language due to the investment in the current language. Here is a comment from Julien Cornebise:

But R is here and in everyday use, and the matter is more of making it worth using, to its full potential. I have no special attachment to R, but any breakthrough language that would not be entirely compatible with the massive library contributed over the years would be doomed to fail to pick-up the everyday statistician—and we’re talking here about far-fetched long-term moves. Sanitary breakthrough, but harder to make happen when such an anchor is here.

R is a pretty amazing language and, as a long term SAS user, I must admit that I am delighted by the graphics and the cool packages. Of course, the dark side of such a rich library is needing to learn about the reliability and limitations of all of the different packages.

H/t: Andrew Gelman

Strange Bedfellows

(where your intrepid blogger sides with Robert Samuelson, gives Ray Fisman a break and wonders what in the hell happened to Jonathan Chait)

You really have to read this Jonathan Chait column on Robert Samuelson, and I don't mean that in a good way. You have to read this to see for yourself how low the educational reform movement can drag even a writer as gifted as Chait.

There's too much here to cover in one post (I could do a page just on Chait's weird reaction to Samuelson's looks, a topic that I had never given any thought to up until now). I may take another pass at another section later but for now I'm going to limit myself to this particularly egregious bit:

How does Samuelson explain the existence of new charter schools that produce dramatically higher results among these lazy, no-good teenagers? He insists, "no one has yet discovered transformative changes in curriculum or pedagogy, especially for inner-city schools, that are (in business lingo) 'scalable.'" This is utterly false. The most prominent example is the Kipp schools, which have shown revolutionary improvements among poor, inner-city students and have rapidly expanded.

It is strange to see Chait take the pro-privatization side of the debate, stranger still to see him accuse critics of charter schools of having an anti-government bias*, but what pushes this into Rod Serling territory is the spectacle of having Chait, one of the most gifted bullshit detectors of the Twenty-first Century, rolling out the same sort of flawed argument that he has made a career out of dismantling.

In order to be viable, a reform has to improve on the existing system by a large enough margin to justify its implementation costs, but if you accept the metrics used by the reform movement, then you will have to conclude that charter schools do worse than public schools more often than they do better.**

So we have a major push to privatize government services which, after about two decades of testing have been shown to under-perform their traditional government-run alternatives. Rather than show why this statistic is misleading, Chait pulls out vague, anecdotal evidence of a single out-lier. Now, given the variability of the data, we would expect the top schools (or even chains) to do pretty well. That alone rebuts Chait's point, but it gets worse. Self-selection, peer effects and selective attrition*** all artificially inflate KIPP's results. When you take these factors into account, it's hard to make a compelling statistical case that even the best charter schools are outperforming public schools (though the second footnote still applies).

At the risk of over-emphasizing, this is Jonathan -- freaking -- Chait we're talking about, a writer known for his truly exceptional gift for constructing logical arguments and, more importantly, spotting the fallacies in the arguments of others. Under normal conditions, Chait would never fall for a badly presented argument-by-anomally, let alone make one, just as, under normal circumstances, a confrontation between Samuelson and Chait would result in little pieces of the former being scraped off of the walls of the Washington Post.

But Chait loses this confrontation decisively. From his ad hominem opening to his factually challenged close he fails to score a single point. And this is far from the only example of this odd reform-specific impairment affecting otherwise accomplished writers. OE has spilled endless pixels on the reform-related lapses, both statistical and rhetorical, of smart, serious, dedicated people like Chait, Seyward Darby and, of course, Ray Fisman (just do a keyword search). None of these people would normally produce the kind of work we've cataloged here. None of them would normally ignore the defection of one of the founding members of the reform movement. None of these people would normally feel comfortable dismissing without comment contradictory findings from EPI, Donald Rubin and the Rand Institute.

David Warsh has aptly made the following comparison:

Remember the recipe for a policy disaster? Start with a handful of policy intellectuals confronting a stubborn problem, in love with a Big Idea. Fold in a bunch of ambitious Ivy League kids who don’t speak the local language. Churn up enthusiasm for the program in the gullible national press – and get ready for a decade of really bad news. Take a look at David Halberstam’s Vietnam classic The Best and the Brightest, if you need to refresh your memory. Or just think back on the run-up to the war in Iraq.

but along with Halberstam, it might be time to brush off our copies of Cialdini's Influence.

From a data standpoint, the past few years have been rough on the reform movement. Charter schools have been shown to be more likely to under-perform than to outperform. Joel Klein's spectacular record turned out to be the product of creative accounting (New York City schools have actually done much worse than the rest of the state). Findings contradicting the fundamental tenets of the movement accumulated. Major figures in research (Rubin) and education (Ravitch) have publicly questioned the viability of proposed reforms.

As Cialdini lays out in great detail, when you challenge people's deeply held beliefs with convincing evidence, you usually get one of two responses. Sometimes you will actually manage to win them over. More often, though, they will dig in, embrace their beliefs more firmly and find new ways to justify them.

I think it's safe to say we don't have response number one.

* Almost all of the major tenets of the modern reform can be traced back to the Reagan era and were closely associated with the initiatives described in Franks' The Wrecking Crew.

** Ironically, if you consider the intellectual framework of the reform movement to be flawed and overly simplistic, you can actually make a much better case for charter schools.

*** From Wikipedia: "In addition, some KIPP schools show high attrition, especially for those students entering the schools with the lowest test scores. A 2008 study by SRI International found that although KIPP fifth-grade students who enter with below-average scores significantly outperform peers in public schools by the end of year one, "... 60 percent of students who entered fifth grade at four Bay Area KIPP schools in 2003-04 left before completing eighth grade."[7] The report also discusses student mobility due to changing economic situations for student's families, but does not directly link this factor into student attrition. Six of California's nine KIPP schools, researched in 2007, showed similar attrition patterns.[citation needed] Figures for schools in other states are not always as readily available."

Wednesday, September 8, 2010

Non-compete agreeements

Felix Salmon is discussing the lawsuit over Mark Hurd accepting a job at Oracle. The crux of the argument seems to be concerns that Mr. Hurd might reveal confidential information as part of his new job. Fortunately, for him, the state of California takes a very dim view of non-compete agreements. From commenter Vania, in the comments to Mr. Salmon's post:

The covenant not to compete, as written, is simply unenforceable under California Business & Professions Code Section 16600:

“16600. Except as provided in this chapter, every contract by which anyone is restrained from engaging in a lawful profession, trade, or business of any kind is to that extent void.”

Now I have a generally dim view of non-compete agreements, so I think that this law is good sense. While I don't have any issue with a CEO having more restrictive rules, given the level of compensation that they are given, I am dubious that these agreements are given only at the CEO level. I know that I was under one, once, in a very junior position at a firm.

The key issue is that I am unsure of how easily one can freely consent to such an agreement in the midst of a dismissal. The person has just had their life turned upside down and likely lost a crucial income stream. The company has had time to prepare the exit package and carefully optimize it for their interests. They have had lawyers look it over and had HR vet the relevant policies. I am unclear that these structural differences in information can be overcome nor do I really see the "sign this or get no severance" as being a real choice for people who have had no ability to assess their options.

At an even more fundamental level, I am unclear why these sorts of agreements don't violate our norms of a free market. How can there be a bigger barrier to free economic activity than a pledge for workers not to sell their skills? We are already worried about issued with implied compensation, does this not count as a hidden cost that is not declared up front?

At best one might argue for contract law and the ability of people/organizations to enter into agreements. But the santicity of contract law seems to be under attack when it favors the worker (consider tenure and defined benefit pension plans). Why do the same arguments about net social good not apply here?

Tuesday, September 7, 2010

Vintage of Educational Reform

I just wanted to make a point in the education debate in passing. It is well worth re-reading Mark's post on his experience with these same reforms:

The trouble is that almost none of the people using the term 'reform' are actually suggesting any reforms. Most of the proposals that have been put forward are simply continuations or extensions of the same failed policies and questionable theories that have been coming out of schools of education for years, if not decades.

In 1994, RAND did a policy brief on the use of standardized scores in education. Look at the conclusions (from 16 years ago):

Research has not been able to pinpoint the effects of the noneducational influences. Nevertheless, people have misused test-score data in the debate to give education a "bad rap." Koretz lists three broad, overlapping kinds of misuse that should be avoided in honest, future debate:

1.Simplistic interpretations of performance trends: These trends should not be taken at face value, ignoring the various factors that influence them: for example, demographic changes in test takers or inflation of scores caused by test-based accountability.

2.Unsupported "evaluations" of schooling: Simple aggregate scores are not a sufficient basis for evaluating education--unless they provide enough information to rule out noneducational influences on performance. Most test-score databases do not offer that kind of information.

3.A reductionist view of education: Koretz notes that it may be "trite" but it is true that education is a "complex mix of successes and failures . . . what works in one context or for one group of students may fail for another." Unfortunately, that truism is often ignored. For example, in the early 1980s, when people were reasonably concerned about falling aggregate test scores, they asked for wholesale changes in policies, without first asking which policies most needed changing or which students or schools most needed new policies.

Maybe we should think twice before declaring that drops or stagnation in scores are necessarily evidence that education is failing? As for the performance gains in charter schools, Mark has been after that issue from the beginning of this discussion here at OE.

It's worth keeping this in mind when evaluating calls for massive reforms on the basis of standardized test scores alone.

Monday, September 6, 2010

Interesting thought

From Grognardia:

Indeed, if there's one "problem" with reading a book like Galactic Patrol nowadays it's that, as the wellspring of so much that came after, it appears trite and unoriginal when, in point of historical fact, everything else that followed it is what's trite and unoriginal. The "Lensmen" series is big and bold and, while I'd never argue that it's scientific speculations hold much water (though, to be fair, many of its ideas were based on the science of its day), it's nevertheless a fun read. There can be no doubt why it exerted such a profound influence on the imaginations of later authors in the genre.

I think that this is a very insightful point that brings up a good point when reading key historical works. Good ideas spawn a lot of imitators and some of these will be better (in terms of some aspects) than the original works. Just think of how many of the imitators of Lord of the Rings are better at pacing the plot! But who among them has such a rich and unique vision of an alternate world?

But these are the works that change entire genres of fiction.

Education in Canada

Another interesting point from Worthwhile Canadian Initiative:

The demand for French immersion education in Vancouver so far outstrips the supply that the school board allocates places by lottery.

But why? Is it because French is a useful employment skill? Because learning to speak French makes you a better person? Or is it because parents know intuitively what economists can show econometrically: peer effects matter. Being with high achieving peers raises a student's own achievement level.

Consider this point quoted in that article:

If students with special needs were equally distributed among all classes, each teacher would on average have 3.4 students with special needs. However, in schools that have early immersion programs, the average in core English programs is about 5.7 students.

I think that it is quite possible that these results could be applied to charter schools with long waiting lists in the United States. Why does this matter?

Because the use of lotteries could otherwise be considered to be a form of randomization. But it seems odd that being taught in a second language would result in better educational outcomes, per se. Which suggests that Frances Woolley has a point that peer effects really do matter and we should consider this when evaluating outcomes in US charter schools.

Update: Of course, I forget to mention that Mark brought up this exact point at the beginning of our foray into education.

Sunday, September 5, 2010

Statistical significance (a never-ending series)

Andrew Gelman has a post on a mis-definition of the p-value. I want to focus on another aspect of the quote:

Despite the myriad rules and procedures of science, some research findings are pure flukes. Perhaps you're testing a new drug, and by chance alone, a large number of people spontaneously get better. The better your study is conducted, the lower the chance that your result was a fluke - but still, there is always a certain probability that it was.

Statistical significance testing gives you an idea of what this probability is.

This is not only an incorrect definition of the p-value but it also appears to be ignoring the possibility of bias and/or confounding. Even in a randomized drug trial (and drug trials are explicitly being used as an example), it is possible to induce selection bias due to non-random loss to follow-up in any non-trivial study. After all, many drugs are such that the participants can guess their exposure status (all analgesics have this unfortunate property) and this can lead to a differential study completion rate among some sub-groups. For some outcomes (all-cause mortality), complete ascertainment can be done using an intention to treat approach to analysis. But that typically induces a uniform bias towards the null.

I am always uncomfortable with how these strong and unverifiable assumptions are glossed over in popular accounts of pharmacoepidemiology.

Saturday, September 4, 2010

Another troubling study

From the EPI paper:

Because of the range of influences on student learning, many studies have confirmed that estimates of teacher effectiveness are highly unstable. One study examining two consecutive years of data showed, for example, that across five large urban districts, among teachers who were ranked in the bottom 20% of effectiveness in the first year, fewer than a third were in that bottom group the next year, and another third moved all the way up to the top 40%. There was similar movement for teachers who were highly ranked in the first year. Among those who were ranked in the top 20% in the first year, only a third were similarly ranked a year later, while a comparable proportion had moved to the bottom 40%.

What's really amazing here is that the authors of the fire-the-bottom-80-percent paper actually cite other work by Timothy Sass and yet manage to overlook this.

EPI Briefing Paper -- Problems with the Use of Student Test Scores to Evaluate Teachers

In terms of education reform, this is probably the biggest story to come over the wires in a long time:

While there are good reasons for concern about the current system of teacher evaluation, there are also good reasons to be concerned about claims that measuring teachers’ effectiveness largely by student test scores will lead to improved student achievement. If new laws or policies specifically require that teachers be fired if their students’ test scores do not rise by a certain amount, then more teachers might well be terminated than is now the case. But there is not strong evidence to indicate either that the departing teachers would actually be the weakest teachers, or that the departing teachers would be replaced by more effective ones. There is also little or no evidence for the claim that teachers will be more motivated to improve student learning if teachers are evaluated or monetarily rewarded for student test score gains.

Read the paper here, then take a look at Kenneth J. Bernstein's detailed analysis and Joseph's brief explanation of why we should listen to Donald Rubin.

Friday, September 3, 2010

The principal effect -- a footnote to the last post

When it comes to education reform, you can't just refer to the elephant in the room. It's pretty much elephants everywhere you look. There is hardly an aspect of the discussion where reformers don't have to ignore some obvious concern or objection.

The elephant of the moment is the effect that principals and other administrators have on the quality of schools. Anyone who has taught K through 12 can attest to the tremendous difference between teaching in a well-run and a badly-run school. Even the most experienced teacher will find it easier to manage classes, cover material, and keep students focused. All of those things help keep test scores up, as does the lower rate of burn out. For new teachers, the difference is even more dramatic.

On top of administrator quality, there is also the question of compatibility. In addition to facing all the normal managerial issues. teacher and and principal have to have compatible educational philosophies.

As we've mentioned more than once on this site, educational data is a thicket of confounding and aliasing issues. That thicket is particularly dense when you start looking at teachers and principals and, given the concerns we have about the research measuring the impact of teachers on test scores, I very much doubt we will ever know where the teacher effect stops and the principal effect starts.

Addiction

I think that it is easy to understate how hard it can be to eliminate an addiction, even one where we know about the potential harms (i.e. smoking). I found this passage really interesting:

One time I had just enough money to put in the electric meter or buy a packet of cigarettes. There I was sat in the dark smoking like a chump trying to comfort myself with the fact that it was not crack.

These sorts of tales really make me ponder whether we should focus more on "harm reduction" and if elimination may not be a quixotic pursuit.

Thursday, September 2, 2010

Oh, Canada -- another interesting omission in "Clean Out Your Desk"

We're back with our ongoing coverage of Ray Fisman's recent article in Slate which ran with the provocative tagline "Is firing (a lot of) teachers the only way to improve public schools?" (notice that he didn't say "a way" or "the best way").

If you tuned in late, here's what you need to know:

Dr. Fisman starts by discussing a presidential commission report from the early Eighties that said the damage done by our poor educational system was comparable to an act of war. This somewhat apocalyptic language has since become a staple of the reform movement. It grabs the attention, justifies big, expensive, untried steps and sets up a false dichotomy between action and inaction.

The proceedings are then handed over to Joel Klein. Klein builds on the verge-of-disaster theme by invoking the United States' low ranking on the Organization for Economic Co-operation and Development's PISA tests. I've commented at some length on the implications of citing PISA while completely ignoring the better-established and well-respected TIMMS even when the discussion shifted to elementary schools where the TIMMS scores would seem to be far more relevant. (The term cherry-picking did come up.)

For now, though, let's grant Chancellor Klein and Dr. Fisman the benefit of the doubt. Let's say we accept the premise that OECD's PISA rankings are such a good and reliable measure of the state of a nation's schools that we don't even need to look at other metrics. We'll even stipulate for the sake of argument that a bad PISA ranking is sufficient grounds for radical measures. With all of these conditions in place, take close look at the next part of Dr. Fisman's article:

What could turn things around? At a recent event that I organized at the Columbia Business School, Klein opened with his harsh assessment of the situation, and researchers offered some stark options for getting American education back on track. We could find drastically better ways of training teachers or improve our hiring practices so we're bringing aboard better teachers in the first place. Barring these improvements, the only option left is firing low-performing teachers—who have traditionally had lifetime tenure—en masse.

The emphasis on better teachers—through training, selection, or dismissal—comes from the very consistent finding that improving faculty is one of the best, most reliable ways to improve schools. If the person standing at the front of the classroom has raised the test scores of students he's taught before, he's likely to do so again.

But how do you get good teachers in the classroom? Unfortunately, it turns out that most evidence points toward great instructors being born, not made. National board certification may help a bit, a master's degree in education not at all. It's also difficult to pick out the best teachers based on a résumé or even a sample lesson. It takes a year or so before evaluators (and even teachers themselves) know who is really good at getting kids to learn, and few qualifications are at all correlated with teaching ability. Candidates with degrees from prestigious colleges—the type where Teach for America does much of its recruiting—do a bit better, but not much.

Here's the gist of Dr. Fisman's premise:

1. According to PISA (the test that trumps all other tests) the state of U.S. education is dire;

2. We need to improve the quality of our teachers "through training, selection, or dismissal";

3. So far, no one has found a way to make training or selection work.

If we want education to do well we might just have to start firing teachers en masse, and by "do well," we mean outscore other countries, which raises the question, "How do other countries find all of those natural teachers?"

Of course, comparing educational systems of different countries can be tricky but we should at least be able to look at Canada. It's a fairly large industrialized country. Not that different economically. Very similar culturally with a comparable K through 12 educational system that has to deal with English as a second language (huge immigrant population), relies on roughly the same type of teacher training/certification that we use and continues to pull teachers in with promises of good job security.

In terms of this discussion, the biggest difference between the two countries could well be Canada's somewhat reactionary approach to reform (for example, only one province, Alberta, allows public charter schools). With such limited school choice and no real attempt to clean out the deadwood from behind the podium, the Canadian educational system looks a lot like the American system before the reform movement.

And how is Canada doing on the PISA math test?

From Measuring up : Canadian Results of the OECD PISA Study:

One way to summarize student performance and to compare the relative standing of countries is by examining their average test scores. However, simply ranking countries based on their average scores can be misleading because there is a margin of error associated with each score. As discussed in Chapter 1, when interpreting average performances, only those differences between countries that are statistically significant should be taken into account. Table 2.1 shows the countries that performed significantly better than or the same as Canada in reading and mathematics. The averages of the students in all of the remaining countries were significantly below those of Canada. Overall, Canadian students performed well. Among the countries that participated in PISA 2006, only Korea, Finland and Hong Kong-China performed better than Canada in reading and mathematics. Additionally Chinese Taipei performed better than Canada in mathematics.

That puts them in the top ten (in science they were in the top three). Now let's review the United States' performance (quoting Dr. Fisman):

Despite nearly doubling per capita spending on education over the past few decades, American 15-year olds fared dismally in standardized math tests given in 2000, placing 18th out of 27 member countries in the Organization for Economic Co-operation and Development. Six years later, the U.S. had slipped to 25th out of 30.

How do we reconcile these facts with Dr. Fisman's argument? As far as I can see, there are only four possibilities (if I've missing some please click the comment button and let me know):

1. Though PISA is a useful test, international PISA ranking may not be a sufficient measure of a country's school system;

2. Teacher quality is not a major driver of national educational performance;*

3. Teachers are made, not born. i.e. it is possible to train people to be good teachers;

4. Canada just got lucky and beat the odds hundreds of thousands of times.

If this were a PISA question, I hope no one would pick number four.

* This is really is a topic for another post, but I would expect the administrator effect to overwhelm the teacher effect. Perhaps Dr. Fisman is going to follow up with a Slate article on firing administrators who produce lackluster test performance.

Small Schools

I often disagree with Alex Tabarrok on education; I think we are both arguing for a better world but we have somewhat different ideas as to the best approach. But his article on small schools is really worth reading. Heck, every epidemiology student should read the article to remind themselves of the hazards of trying to interpret the ranks in a population without also interpreting the level of variance.

Very well done.

Wednesday, September 1, 2010

Rubin on Educational Testing

From a Daily Kos post about the use of Value-Added Assessment methodologies:

In 2004, Donald Rubin opined

We do not think that their analyses are estimating causal quantities, except under extreme and unrealistic assumptions.

Now I am not familiar with the actual research, but I am likely to take Donald Rubin seriously. Not only is he one of the founders of causal inference, multiple imputation, and propensity scores, but he has a long history of tackling extremely difficult epidemiological problems. For a humbling experience (for those of us in biomedicine) his CV is here.

I dislike appeals to authority, in general, but claims that researchers skeptical about the value of these testing methods are misinformed seem to be poorly grounded. I don’t want to say Rubin is right about everything but I do think we should take his concerns seriously.

[as a side note, he was also the PhD supervisor of Andrew Gelman, whose blog is worth following]

Econned

It is the start of the school year and time to read something non-epidemiological or statistical. So, being me, I decided to read Yves Smith's new book Econned. I'll let you know what I think but reading the introduction this morning suggests that the book is off to a strong start. The best quote so far:

Theories that fly in the face of reality often need to excise inconvenient phenomena, and mainstream economics is no exception.

This quote reminds me of Karl Popper's thinking; one often learn more based on what does not fit your theory then from what does (i.e. falsification). This principle is hard to follow in very complex fields (like economics and epidemiology) where you are guaranteed to have at least some mismatches and disconfirming evidence for everything. But it is good to cultivate a sense of humility about our models!