Thursday, August 19, 2010

Ray Fisman and the Tierney Ratio

The Tierney Ratio (sometimes called the Tierney Test because people love alliteration) is a measure of journalistic mediocrity named for its frequent subject, John Tierney. You find the Tierney Ratio of an article by counting the number of words it takes to address all of the significant problems in the article, then dividing that by the article's word count.

As you might expect, Tierney Ratios vary greatly from author to author. The sorely-missed Olivia Judson maintained a TR of virtually zero while writing for the New York Times while John Tierney, a science writer with no appreciable background in or aptitude for science, routinely had observed TRs in excess of one or two. (it is possible that Judson was kept in the Op-Ed rather than Science section out of concern that she would unfairly lower the latter's average.)

The value of the Tierney Ratio is somewhat limited by its serious data censoring problem (analogous to this well-known example). Faced with articles and essays of sufficiently low quality, researchers are almost always forced to leave significant mistakes, distortions and fallacies unaddressed.

Which brings us back to Ray Fisman's recent column in Slate, which reaches an almost Hellmanesque level of inaccuracy. Getting a true TR on something like this is an extraordinarily tedious job so the readers who aren't into hardcore education wonkery might want to skip the next few posts. You'll know it's safe to come back when we start posting Daily Show clips again.

Data censoring and Tootsie Pops

There is reason to suspect undercounting.

Wednesday, August 18, 2010

Desperately Seeking Suzanne (Null)

In Life in Hell, Matt Groening once asked if there was anything scarier than an open-mic poetry night. As a general rule, I have the same reaction to comment sections. There are exceptions (Andrew Gelman's site come to mind) but most of the time your chances of happening upon an intelligent and insightful conversation are better when you sit down between two strange drunks in an unfamiliar bar.

So you can understand why I initially skipped over the comments to Ray Fisman's recent post (if comments are usually less intelligent and well-written than the articles they accompany, just imagine the Stygian depths these would have to sink to in order to follow Fisman). Fortunately Joseph did brave the bottom of the web page and discovered that the comments here were actually better than the piece that inspired them.

The best of that very good crop were the entries was by Suzanne Null, who is (I believe) an education professor in the Northeast. [update, strike that last part. It looks like Suzanne is a fellow Westerner.] In this series of comments she takes down Fisman brick by brick:
Didn't Fisman's teachers ever teach him to conduct some research and check the validity of his sources (there is better and more recent information than the 1997 research he cited) before he publishes something? Virtually all of the information in this article has already been debunked. See Ravitch's copiously-researcherd "The Death and Life of the Great American School System" (particularly Chapter 9) and practically everything by Stanford's Linda Darling-Hammond. For example:

1) There is a great deal of evidence that better training helps teachers improve instruction (see research by Darling-Hammond and on the "Research" section of the www.nwp.org site). Teachers are "made" (not born) through training, professionally supportive school environments, and supportive communities. Experience makes a difference in teacher effectiveness (Ravitch 190) and one of the most major problems with the teaching profession is its high rate of attrition; many teachers leave the profession by their fifth year.

2) Despite what this article says about identifying "bad" teachers, we haven't yet found a reliable way to identify who the "bad" teachers are. Test scores are one-dimensional and subject to numerous validity and reliability issues (Ravitch 152-154). In addition, despite the claims made in this article, test scores can vary significantly by teacher from year to year because there is so much variation among the students in the teachers' classes. (Ravitch 185-186). A teacher who the tests identify as "high performing" one year might appear to be "low performing" the next.


3) The article insinuated that schools can "close the gap" simply by hiring the top quintile of teachers. This research comes from Gordon, Kane, & Stagler 2006; Hanushek & Rivkin (2004), and Sanders (2000), all cited by Ravitch (183-184). This has also been debunked because the learning gains cited in these articles don't persist over time (Jacob, Lefgren, & Simms 2008) and because of the general unreliability of the tests, particularly when used for the purpose of evaluating teachers, which was not the primary goals when most of these tests were designed.

4) The effectiveness of Teach for America (TFA) has been inconclusive (Rativich 188-191). For example, an extensive study by Darling-Hammond's found that TFA teachers "had a negative or non-significant effect on student achievement" (2005, cited by Ravitch 189). Thus "degrees from prestigious colleges" are also NOT a predictor of effective teaching. In any case, it is delusional to believe that the entire country can sustain the constant turnover of teachers that has characterized TFA (particularly given schools' current budgets for teacher pay) or that this level of turnover would be desirable for our students (Ravitch 190).

5) The research on the "cumulative effects" of attending NYC charter schools has been proven to be invalid. Charter schools are not all successful -- some post higher test scores than their comparative "public" schools and others post lower scores. When they have higher scores, it is usually because they take the students who chose to enroll or enter the lottery system. These students and their families tend to be more engaged with their education in general, and thus tend to perform better, no matter what the teacher does. Charter schools also take fewer students with special needs, such as Special Education students or second-language learners. When researchers have adjusted for these differences in their data pools, they have found no significant differences between charter schools and public schools (Ravitch 140 -143).

...

What's particularly worrisome and insidious about the author's arguments are that they will further harm students within our school system. If the "thought experiment" of abandoning teacher selection based on qualifications and teacher training is ever carried out in favor of allowing anyone to try to teach so that the "data" can winnow out the top 20%, it will mean that our students will bear the brunt of training and selecting teachers. They will be subjected to a revolving door of completely untrained teachers, and they will lose educational time and opportunities as they experience the steep learning curve that is present for teachers in their first two years. Our students deserve trained and experienced teachers; they don't deserve to be the guinea pigs that have to test out anyone who walks in off the street.
...

If we really want to overcome "mediocrity" in schools, we should focus on retaining the best teachers, giving them the professional freedom and support to do their jobs, and incentives for high performance, not just on tests, but on other measures of teacher success. Since I began teaching in 2000, I've watched many of the hardest working, most committed, and most motivated teachers leave. Those who stay are a few of the truly exceptional ones, and many of the ones who are happy to administer lectures and scan-tron tests and then go home. Teachers have few opportunities for professional advancement (aside from maybe becoming a principal), and few incentives to go "over and above" in their jobs. If we really want to improve education, our school SYSTEMS' tendency to support the mediocre and discourage (or even fire) the best is what will need to change. This change will require better working conditions, better support, more resources, smaller classes, and even better pay incentives for our hardest working and best-performing teachers.
...

I would add that the whole "martyr" or indolent "loser" dichotomy presented in the media's portrayals of teachers allows our society to evade responsibility for actually improving schools. If the best teachers are great because they CARE so much about their students and are willing to sacrifice so much, and if as the article says they are "born great," then they won't require smaller classes, better materials, more manageable work responsibilities, or higher pay. And if the worst teachers are indolent, then more money isn't going to help them anyway. The entire construction allows our culture to continue to alternately lionize and blame teachers while doing nothing that would actually help support teachers in their endeavors to help students learn.
...

Actually, the one strategy that's been proven to raise test scores is to winnow out the low-scoring students. This can be accomplished by re-drawing school attendance boundaries, creating "choice" or charter schools (which of course don't have the "resources" for Special Education students, second-language learners, or students with behavioral issues), or by "encouraging" the low-performing students to drop out or leave. The schools that have done this have been able to tout the "excellence" of their school management and teacher training, and their principals and superintendents have often gotten promotions and large pay raises.

So maybe our schools should all try that.
...
Just to clarify, this last suggestion was facetious. If our only way to "improve" our schools is to stop serving all of our students, is that a form of "success" that's worth having?
...

EB, I've particularly heard stories about nepotism and favortism from teachers in rural schools, so I know it happens. But many teachers' major fear about "performance based" pay is that it will be subject to the same dynamics. Even if teachers are evaluated solely on "data" such as test scores (which isn't a good idea for other reasons), it is very easy for principals to stack the deck against teachers they don't like by giving them the lower-performing students, making them change grades, or subjects, giving them unfavorable schedules, etc. I've already heard from some teachers I know in a rural area that principals will "drive a teacher out" by say, transferring them from fourth grade to first, only to then blame the teacher when test scores dip because the teacher hasn't had time to accumulate the practice and materials for the new grade level.

Many teachers are supportive of standards, accountability, and even incentive pay, but they want to be evaluated in fair and equitable ways.
Suzanne, if you've got a place you're posting on a regular basis, let us know and we'll add you to our blogroll. What you have to say deserves the widest possible audience.

Tuesday, August 17, 2010

A Partial N-space of Eduction Reform -- another preblogged footnote

If we are going to have an intelligent conversation about education (which at this point would be a refreshing change of pace), we have to start by thinking about the n-space. There are multiple dimensions that have to be considered here. As long as the debate fails to acknowledge them or approaches them in a sloppy way, the analyses will continue to be fatally flawed.

We could look at this on the level of classes or individual students, but in this case it probably makes the most sense to think of each school as representing a point in this multidimensional space. We assume these points are more or less fixed with respect to some of these dimensions (grade level, population density [rural/suburban/urban], demographics, region, etc.) but we like to believe that we can change the position of these points with respect to other dimensions (retention, discipline, standardized test scores, etc.).

Why is it so important to think in terms of this multidimensional space? Because there are few meaningful statements that are valid across these various axes. When Doug Staiger and Jonah Rockoff (here via Ray Fisman) made radical suggestions about teacher hiring policies, they based them on a study of arguably the two least representative school districts in the country. Even if the rest of the study were sound (rather than being a train wreck, but more on that later), the findings would be worthless for most of the country.

Worse yet, when you make a substantial change in educational policy, there is a wide range of relationships between the effects you see along different dimensions, including possible inverse relationships between retention and other measures of school performance (the fastest and most reliable way for a school to improve its performance is to get rid of the students it can't handle).

Even with the most careful of reasoning, the most clearly stated questions and the most closely examined assumptions, this kind of complex, multidimensional system can react to new conditions in dramatic, counterintuitive ways. If you approach it with the kind of sloppy thinking that has dominated the education debate, you are asking fate to do some very bad things.


[thanks to Wikipedia for the hypercube]

A dark, swirling, mammoth wall of wrong

Late last night (or more accurately early this morning) I had the TV on as background noise as I debugged some text mining code. The late show was airing Hidalgo and I happened to tune in shortly before the sandstorm scene.

Today, as I read this post by Ray Fisman, I had the sensation of being engulfed, much as the unlikely riders were, in an enormous, violent impenetrable cloud of bad arguments, flawed reasoning, shoddy research and statistical errors.

I'll try to make some sense of this tomorrow but in the meantime, check out Joseph's comments here.

Monday, August 16, 2010

When people don't understand regression equations

This article seems to be one of the worst mis-understandings of regression that has been posted in a while. Let us consider the heart of the argument:

When they ran the numbers, the answer their computer spat out had them reviewing their work looking for programming errors. The optimal rate of firing produced by the simulation simply seemed too high: Maximizing teacher performance required that 80 percent of new teachers be fired after two years' probation.

After checking and rechecking their analyses, Staiger and Rockoff came to understand why a thick stack of pink slips are needed to improve schools. There are enormous costs to having mediocre teachers burdening the school system, and once they get their union cards, we're stuck with them for decades. The benefits of keeping only the superstars is enormous, such that it's better to risk accidentally losing some of the good ones than to have deadwood sticking around forever.


The regression equation is assuming that all things remain equal. Presuming that there are 3 million teaching jobs in the United States (which was true in 1999 with 3.1 million), that would require filling 1.2 million vacancies per year. It's hard to get a good number for the total number of college graduates per year, but in 2004 there were 2.6 million freshmen; so one would assume, given a 100% graduation rate, that nearly 50% of college graduates would spend two years teaching (before being fired). Remember, in the long run this is sampling without replacement as we don’t rehire people who have already been fired in previous years.

Two comments come to mind. One, you have to have a powerful incentive to make the majority of college students do this. Either a social expectation (as in a teacher draft) that encourages potential teachers to give two years of service or some sort of extremely lucrative remuneration scheme would have to be developed.

Two, can we really believe that a cross section of 50% of college graduates would have better teaching ability than the median teacher currently does?

Furthermore, a school with a constant staff flux may have different characteristics than the current system. Teachers may be more willing to quite mid-year for another opportunity. Every year nearly 50% of teachers are learning the basics of school operations, administration and the material being taught. How do we get teachers to invest in long term outcomes and how do we handle mentoring new teachers given how few established teachers there are?

Which makes the decision to focus on this particular practical difficulty almost surreal:

And, of course, another issue is politics. It's hard to reconcile an 80 percent dismissal rate with the existence of teachers' unions: Pushback from unions and the government leaders who rely on their support have largely managed to prevent any breach of teacher job security thus far.


I think the bigger concern is to look at how we would overcome structural staffing issues. Or to wonder if the 80% of temporary teachers could possibly be superior to the teachers they replace. Seriously, I think that the existence of unions is far down on the list of concerns here.

Heck, if we reject a "draft based system" and presume that "social shaming" is unlikely to work, one might wonder if there was a way to invest the additional resources that we'd need to put into salary to make HALF of college graduates delay their career plans to teach in k-12 school systems.

All of this is based on a simulation study, which means the authors have failed to account for the degradation in the teaching pool as they increased the rate of rejection of teachers. By holding the job pool constant (i.e. holding the quality of the marginal recruits constant as you decrease the retention rate) they have made one of the classic mistakes of regression analysis.

Sunday, August 15, 2010

Light Posting

It is conference season (two conferences in the next two weeks) and I have no fewer than five talks to give. So I may be more silent than usual between now and September (especially as I may not have reliable internet access).

Apologies in advance.

Saturday, August 14, 2010

Bad collaborators

I have said it before and I will say it again: good collaborators are the best gift an early career scientist can have. That makes cases like this one all the more stark!

It's pretty clear to me that nobody will come out of this particular mess happy. The main researcher will have a hard time publishing their paper. The collaborator has lost an important paper. Confusion has been created in the the scientific literature. Nobody comes out ahead.

So far I've been very lucky on this front. Here is hoping that this continues.

Friday, August 13, 2010

Bad teachers, thought experiments and anecdotal data

Statisticians often have to come up with a first draft of metrics, filters, winnowing processes, etc. without having a sample of the data they'll be using. One approach to the problem is to take some anecdotal cases and ask ourselves how the system we've proposed would handle them. Would it have trouble classifying, leaving them in some 'other' box, or worse yet, would it mis-classify them, putting something that's clearly bad into the good or even excellent category?

Here's a thought experiment. Many years ago, when teaching at a medium-sized suburban school, I had a classroom across the hall from a football coach who taught history. For the record, some of the best teachers and administrators I have ever dealt with came from coaching. They were gifted motivators who brought to the classroom the same belief in excellence and "giving 110%" that they brought to the field or the court.

This was not one of those coaches.

Not only did he make no effort to motivate his students; I'm not sure he interacted with them in any way. His desk was set up at the back of the room, not a bad arrangement for a study hall but it effectively precluded addressing the class or answering questions or leading a discussion. As far as I could tell, the issue never came up. Students spent their hour filling out worksheets that he had Xeroxed out of a workbook. He spent the hour grading them.

I have never seen a more mind-numbing, soul-crushing approach to education but that didn't stop the principal from holding up this teacher as a role model for the rest of us. His classes were quiet, he never sent a student to the principal's office, and though the student's grasp of the material seldom extended beyond the rote level, that was sufficient for pretty good standardized-test scores (at least for knowledge-based rather than process-based courses).

This was almost two decades ago. Significant chunks of the current reform movement were already in place but No Child Left Behind was still years away. The teacher in question retired the year before I entered graduate school, but assuming he was still around, how well would he do under the proposed teacher evaluation system?

Presumably, most teacher evaluation metrics will largely be based on some combination of three factors: student test scores; classroom management; and supervisor evaluations. Our worksheet-dispensing educator would normally do well on the first and would max out the other two. I said 'normally' because (as mentioned before) these metrics are easy to game and the principal could easily arrange things to bump the test scores for his favorite teacher while screwing over a trouble-making teacher he would like to get rid of (someone like me, for instance).

Even if we assume that the principal didn't play favorites (and that's not an assumption I would have made with this administrator), this teacher would unquestionably be looking at generous bonuses. The question is, is this how we want to define excellence in education?

Chart of the day

Without getting into tax policy (and way out of my area of expertise), I think this chart (from Ezra Klein) does a good job putting the debate into prospective.

FemaleScienceProfessor on Tenure

This article is well worth the read. My favorite part:

Would a system of renewable contracts really allow professors to break out of the "publish or perish" mania? Methinks it might have even the opposite effect. If there were no tenure, the rat race would never end. And, since academia is apparently equivalent to a customer service industry, consider what renewable contracts for advisers would do to their graduate students and postdocs, not to mention the research infrastructure that we build in part from grants and in part from our institutions, and use to train our advisees.


The more one thinks about the whole tenure issue, the clearer it becomes that things are not as simple as "removing tenure would improve the academy". I had not even considered the issue that professors would be training their competitors in a rotating contract system, which would definitely make the sort of long term investment strategy that we currently have hard to incent.

FSP has a more sympathetic view of Cathy Trower's piece; I'll grant that the ideas at the end of the Trower piece are an improvement over the part I like to quote. I'm not anti-reform but I would prefer that reform not consist entirely of massive changes to employment contracts introduced from above.

But it's a very well thought out piece and definitely worth a read.

Epidemiology Data

In principle, I am highly supportive of the free release of data. But the issue is very tricky in epidemiology for two reasons.

One, data can make years (or even decades to collect). Making a study with many objectives instantly publicly available would make it hard for the orginal research team to be properly credited for the work. There are solutions that might work, but so long as the primary method of credit for grant and data collection is via the papers produced this will be tricky.

Two, epidemiological data often has a lot of tricky analysis issues. It's not implausible that taking short-cuts with the data analysis and taking an overly simple approach might not result in a publication being ready more quickly. It's good for neither the main team (which now has to rush) or the reading public (which has a higher risk of scientific errors).

So the principle is good but the implementation is much harder than it looks. It really is an area waiting for a good idea.

Thursday, August 12, 2010

Mentioned in passing

I've been overusing the "Save and Quit" option on Firefox too much of late, holding onto things that seemed to merit blog posts I didn't have time for. So in the interest of good desktop hygiene, here's a quick summary of some items I'd like to say more about later:

If you can manage it, there's a good reason to sleep in tomorrow.

Here's a list of the worst paying college degrees. Guess who lands at #2?

The LA Times blog has an interesting post on the relationship of toy sales and film-making.

I just learned that the LA Times lost one of its best writers. It's still a better paper than that other Times but the lead is shrinking.

America's finest fake news source remains our best source of real news analysis:

The Daily Show With Jon StewartMon - Thurs 11p / 10c
Deductible Me
www.thedailyshow.com
Daily Show Full EpisodesPolitical HumorTea Party

Educational Reform and resources

This was an interesting suggestion on educational reform proposed by Dana Goldstein (h/t Matt Yglesias). He comments:

Rather, I'm imagining something like what the best public, private, and charter schools are already doing: a mix of additional instructional time and mealtimes with small group break-out activities like reading clubs, sports, board games, supervised computer time, library browsing time, and art and music lessons.

As a practical matter, to make this happen schools need extra labor: more hours from teachers, as well as specialized, perhaps part-time instructors in the arts and athletics.


Now I don't want to guarantee that this is a good idea. However, in a world of two working parents, a longer school day could be welfare enhancing and it's not impossible that it would have the effects on childhood obesity that are suggested by Ms Goldstein.

But there is one feature of this plan that I think it really makes sense to consider -- Ms Goldstein is discussing increasing the resources directed at schools (via meal subsidies, extra staff and additional funding) in order to improve outcomes. Could it fail? No questions. But it differs from a lot of education discussions by not trying to link a reduction of resources (e.g. removing tenure as a form of compensation) to improved outcomes. Instead, it argues that putting more resources into schools could result in a net public good.

That is a much better starting point for discussion (i.e. is this the best use of scare public resources) than the argument that removing resources will improve outcomes (so we can pay less and have a better school system).

In terms of the obesity argument, I suspect that much of this will hinge on the ability of the school to control eating and activity patterns. If students have snacks and/or school meals are not healthy then this seems less likely to work. In the same sense, putting together an activity program that succeeds in getting students to become active is not necessarily trivial.

But it certainly is worth an open discussion.

Replication

I have the opposite problem that Candid Engineer has with "fishy results". I have long advocated the central role of replication in science. This is especially important in Epidemiology where experiments are (by their nature) rare and so one needs to do most of their inference from observational research.

But how do you make a paper that has a near perfect replication seem interesting?

I mean it's good for science but it rather deadens the discussion section to have not that all much new to add except "that association is also observed in different populations".

Sigh!