West Coast Stat Views (on Observational Epidemiology and more)

Friday, August 2, 2013

"An astonishing act of statistical chutzpah" -- details on the Tony Bennett scandal

Jordan Ellenberg at Slate and Anne Hyslop at the New America Foundation have posts up explaining exactly how the Indiana Department of Education managed to change the grade of an influential donor's charter school from a C to an A. It appears to be your basic DATA COOKING 101. After seeing results you don't like, go back, try different weightings and look for excuses to drop bad scores then apply these changes selectively.

The 'selectively' part is particularly important and hasn't gotten the attention it deserves:

Two Indianapolis Public Schools might never have been taken over by the state if then-Superintendent of Public Instruction Tony Bennett had offered the district the same flexibility he granted a year later to the Christel House Academy charter school.

The issue was similar in both cases. Christel House had recently added ninth and 10th grades, and IPS’ Howe and Arlington had added middle school grades. The students who filled those seats posted poor enough scores to drag down the schools’ overall ratings.

In the case of Christel House, emails unearthed by The Associated Press show Bennett’s staff sprung into action in 2012 when it appeared scores from the recently added grades could sink the highly regarded school’s rating from an A to a C. Ultimately, the high school scores were excluded and the school’s grade remained an A.

But in 2011, after IPS’ then-Superintendent Eugene White demanded Bennett consider the test scores of high school students separately from those of middle school students so the high schools could avoid state takeover, Bennett was unmoved.

As for the specifics, Hyslop goes into more detail (and has a wonderfully apt movie quote to make her point), but Ellenberg probably does the better job summing it up:

Here’s where Bennett’s team found the loophole big enough to drive a charter school through. A normal person would do exactly what Chief Accountability Officer Jon Gubera did—give Christel the weighted average of its elementary/middle school score, according to the rules for elementary/middle schools, and its high school score, according to the rules for high schools.

But Bennett had a better idea. Christel was, technically speaking, not a high school, so the statutory formula for the high school grades didn’t apply. But it also didn’t have all four high school measures, so, he argued, the rules for combined schools didn’t apply either. There were just 13 schools in the state that had both middle school and high school grades but no seniors. For these schools, Bennett reasoned, the Indiana education poobahs should have a free hand to set the grades however they pleased. You can guess what happened next: Bennett ruled that the ninth- and 10th-graders in these schools didn’t count at all. So it was that the offending algebra grades vanished in a puff of bureaucratic smoke...

This was an act of astonishing statistical chutzpah. Suppose the syllabus for my math class said that the final grade would be determined by averaging the homework grade and the exam grade, and that the exam grade was itself the average of the grades on the three tests I gave. Now imagine a student gets a B on the homework, gets a D-minus on the first two tests, and misses the third. She then comes to me and says, “Professor, your syllabus says the exam component of the grade is the average of my grade on the three tests—but I only took two tests, so that line of the syllabus doesn’t apply to my special case, and the only fair thing is to drop the entire exam component and give me a B for the course.”

Ellenberg then made an observation that echoed some of my earlier points about how the mindset of the reform movement can enable these ethical lapses:

The saddest part is that I’m guessing Bennett sincerely felt he was doing the right thing. In his mind, he knew Christel was a great school, so if the scores said otherwise, the scores had to be wrong. In this respect, ironically, he ends up echoing his policy opponents, adopting the position that a mechanistic testing and scoring procedure can’t be allowed to override firsthand knowledge about teachers and schools.

The Op-ed no one wanted to print

When it comes to the education beat, this guy has one hell of a resume.

John Merrow began his career as an education reporter with National Public Radio in 1974, with the weekly series, “Options in Education.” In 1984, Merrow branched out into public television. He served as host of The Merrow Report, an award-winning documentary series, and currently is the Education Correspondent for PBS NewsHour. Merrow’s work has taken him from community colleges to kindergarten classrooms, from the front lines of teacher protests to policy debates on Capitol Hill. His varied reporting has continually been on the forefront of education journalism.

Of course, all of these qualifications don't necessarily make a person right, but they generally do make a person publishable. That's what makes this story curious, because Merrow can't seem to get anyone to publish the following op-ed. This is the second time in a row that he's found newspapers and magazines reluctant to publish reporting on Micelle Rhee.

As mentioned before, Rhee has gone from feted to indefensible so quickly that it's difficult for most journalists and pundits to cover her current activities without looking comically gullible for swallowing her previous rhetoric. (And journalists don't like looking gullible.)

CAVEAT EMPTOR: MICHELLE RHEE’S EDUCATION REFORM CAMPAIGN

"Today, too many of America’s children are not getting the quality education they need and deserve. StudentsFirst is helping to change that with common sense reforms that help make sure all students have great schools and great teachers." (StudentsFirst press release, emphasis added)

Michelle Rhee created StudentsFirst after leaving her post as Chancellor of Washington, DC’s Public Schools in the fall of 2010. She announced her intentions on “Oprah” that December: to fix America’s schools by enrolling one million members and raising one billion dollars.[2]

Easily America’s most visible education activist, she has been crisscrossing the country lobbying for change and donating money to candidates whose policies she supports. StudentsFirst claims to have helped pass 110 ‘student-centered policies’ in 18 states.

Because Ms. Rhee is trying to persuade the rest of the country to do as she did in Washington, it’s worth asking what her ‘common sense reforms’ accomplished when she had free rein to do as she wished.

She was definitely in charge. Her boss, a popular new mayor, told his Cabinet that trying to block his Chancellor was a firing offense. The business community, a public fed up with school failure, and the editorial pages of The Washington Post were enthusiastic supporters. Moreover, she had virtually no opposition: the local school board had been abolished when the Mayor took over, and the teachers union, reeling from its own financial scandals, had an untested rookie president. She knew how lucky she was.

"I’m living what I think education reformers and parents throughout this country have long hoped for, which is, somebody will just come in and do the things that they felt was in the best interest of children and everything else be damned. (Interview, fall 2007)"

She lived that dream for 40 months. She opened schools on time, added social workers, beefed up art, music and physical education, and dramatically expanded preschool programs. The latter may represent her greatest success, because children who began their schooling in the expanded preschool program tend to do well on the system’s standardized test in later years.

Ms. Rhee made her school principals sign written guarantees of test score increases. It was “Produce or Else” for teachers too. In her new system, up to 50% of a teacher’s rating was based on test scores, allowing her to fire teachers who didn’t measure up, regardless of tenure. To date, nearly 600 teachers have been fired, most because of poor performance ratings. She also cut freely elsewhere–closing more than two-dozen schools and firing 15% of her central office staff and 90 principals.

When Ms. Rhee departed in October 2010, her deputy, Kaya Henderson, took over. She has stayed the course for the most part, although test scores now make up–at most–35% of a teacher’s rating score.

Some of the bloom came off the rose in March 2011 when USA Today reported on a rash of ‘wrong-to-right’ erasures on standardized tests and the Chancellor’s reluctance to investigate. With subsequent tightened test security, Rhee’s dramatic test scores gains have all but disappeared. Consider Aiton Elementary: The year before Ms. Rhee arrived, 18% of Aiton students scored proficient in math and 31% in reading. Scores soared to nearly 60% on her watch, but by 2012 both reading and math scores had plunged more than 40 percentile points.

But it’s not just the test scores that have gone down. Six years after Michelle Rhee rode into town, the public schools seem to be worse off by almost every conceivable measure.

For teachers, DCPS has become a revolving door. Half of all newly hired teachers (both rookies and experienced teachers) leave within two years; by contrast, the national average is understood to be between three and five years. Veterans haven’t stuck around either. After just two years of Rhee’s reforms, 33% of all teachers on the payroll departed; after 4 years, 52% left.

It has been a revolving door for principals as well. Ms. Rhee appointed 91 principals in her three years as chancellor, 39 of whom no longer held those jobs in August 2010. Some chose to leave; others, on one-year contracts, were fired for not producing quickly enough. Several schools are reported to have had three principals in three years.

Child psychiatrists have long known that, to succeed, children need stability. Because many of the District’s children face multiple stresses at home and in their neighborhoods, schools are often that rock. However, in Ms. Rhee’s tumultuous reign, thousands of students attended schools where teachers and principals were essentially interchangeable parts, a situation that must have contributed to the instability rather than alleviating it.

Although Ms. Rhee removed about 100 central office personnel in her first year, the central office today is considerably larger, with more administrators per teachers than any of the districts surrounding DC. In fact, the surrounding districts reduced their central office staff, while DC’s grew. The greatest growth in DCPS over the years has been in the number of central office employees making $100,000 or more per year, from 35 when she arrived to 99 at last count.

Per pupil expenditures have gone up sharply, from $13,830 per student to $17,574, an increase of 27%, compared to 10% inflation in the Washington-Baltimore region. So have teacher salaries; DC teachers now earn on average more than their counterparts in nearby districts in Virginia and Maryland.

Enrollment declined on Ms. Rhee’s watch and has continued under Ms. Henderson, as families continue to enroll their children in charter schools or move to the suburbs. The year before she arrived, DCPS had 52,191 students. In school year 2012-13 it enrolled about 45,000, a loss of roughly 13%.

Even students who have remained seem to be voting with their feet, because truancy in DC is a “crisis” situation, and Washington’s high school graduation rate is the lowest in the nation. The truancy epidemic may be the most telling data point of all, because if young people in this economy are not going to school, something is very wrong. They are not skipping school to work–because there are no jobs for unskilled youth.

Ms. Rhee and her admirers point to increases on the National Assessment of Educational Progress, an exam given every two years to a sample of students under the tightest possible security. And while NAEP scores did go up, they rose in roughly the same amount as they had under her two immediate predecessors, and Washington remains at or near the bottom on that national measure.

The most disturbing effect of Ms. Rhee’s reform effort is the widening gap in academic performance between low-income and upper-income students, a meaningful statistic in Washington, where race and income are highly correlated. On the most recent NAEP test (2011) only about 10% of low-income students in grades 4 and 8 scored ‘proficient’ in reading and math. Since 2007, the performance gap has increased by 29 percentile points in 8th grade reading, by 44 in 4th grade reading, by 45 in 8th grade math, and by 72 in 4th grade math. Although these numbers are also influenced by changes in high- and low-income populations, the gaps are so extreme that is seems clear that low-income students, most of them African-American, generally did not fare well during Ms. Rhee’s time in Washington.

English Language Learners in Washington’s schools are also struggling. Title III of ESEA requires progress on three distinct measures: progress, attainment and what ‘No Child Left Behind’ calls ‘adequate yearly progress.’ DC failed on two out of three last year.

DC doesn’t fare well in national comparisons either. Between 2005 and 2011, black 8th graders in large urban districts gained five points in reading, while their DCPS counterparts lost two points, according to a study by the DC Institute of Public Policy released this spring. Between 2005 and 2011 in large, urban districts, Hispanic eighth-graders gained six points in reading (from 243 to 249), black eighth-graders gained five points (from 240 to 245), and white eighth-graders gained three points (from 270 to 273). In District of Columbia Public Schools, however, Hispanic eighth-graders’ scores fell 15 points (from 247 to 232), black eighth-graders’ scores fell two points (from 233 to 231), and white eighth-graders’ scores fell 13 points (from 303 to 290).

The states that have adopted her approach, and others now being lobbied, might want to make their own data-driven decisions.

Thursday, August 1, 2013

A final (?) news wrap-up on the Bennett story

As previously mentioned:

INDIANAPOLIS (AP) — Former Indiana and current Florida schools chief Tony Bennett built his national star by promising to hold “failing” schools accountable. But when it appeared an Indianapolis charter school run by a prominent Republican donor might receive a poor grade, Bennett’s education team frantically overhauled his signature “A-F” school grading system to improve the school’s marks.

You can get Joseph's reaction here and mine here and here.

Despite considerable support from the reform movement, you can add another 'former' to that paragraph.

Bennett said he resigned “because I don’t believe it would be fair to be distracted” by what he characterized as “malicious and unfounded” reports.

Just yesterday, Gov. Scott told Channel 5 in West Palm Beach that Bennett is “doing a great job.

In what was already a bad news day for Bennett:

In June of 2011, Tony Bennett, then Indiana’s superintendent of public instruction, picked a for-profit education company in Florida to run a group of Indianapolis public schools.

The company, Charter Schools USA, set up operations in Indianapolis soon after the announcement and officially began running Manual High School, T.C. Howe High School and Emma Donnan middle school in the late summer of 2012. Millions of Indiana tax dollars have since flowed to the company, which has received many good reviews for its work in Indianapolis.

But a recent hiring decision by Charter Schools USA is sure to raise eyebrows and questions about conflicts of interest, particularly now that Bennett is embroiled in a massive controversy centering on special treatment given to certain Indiana schools during his tenure.

The decision: Charter Schools USA earlier this year hired Tony Bennett’s wife, Tina, as a regional director based in Florida, where Tony Bennett was hired late last year as commissioner of education. And, so, the bottom line is this: Tina Bennett is now earning a paycheck from the company her husband hand-picked to take over schools in Indiana, a decision that was very good for the company’s financial fortunes.

It’s important to note that Tina Bennett is a longtime educator, a former school administrator and counselor. She is also an advocate of the type of school choice efforts that Charter Schools USA is built on. In Indiana, she faced criticism and sometimes cruel treatment for taking a job with education groups tied to her husband’s former office. But it’s understandable that she would seek work in the education field.

To provide some context (and a bit of schadenfreude) for Bennett's fall, here's a reminder of where Bennett ranked in the reform firmament.

Value added testing without a gold standard outcome

From a Megan Pledger comment on StatChat comes this paper (pdf) on value added testing models for evaluating teachers. The following concerns were brought up:

In the real world of schools, data is frequently missing or corrupt. What if students are missing past test data? What if past data was recorded incorrectly (not rare in schools)? What if students transferred into the school from outside the system?

The modern classroom is more variable than people imagine. What if students are team-taught? How do you apportion credit or blame among various teachers? Do teachers in one class (say mathematics) affect the learning in another (say science)?

Every mathematical model in sociology has to make rules, and they sometimes seem arbitrary. For example, what if students move into a class during the year? (Rule: Include them if they are in class for 150 or more days.) What if we only have a couple years of test data, or possibly more than five years? (Rule: The range three to five years is fixed for all models.) What’s the rationale for these kinds of rules?

Class sizes differ in modern schools, and the nature of the model means there will be more variability for small classes. (Think of a class of one student.) Adjusting for this will necessarily drive teacher effects for small classes toward the mean. How does one adjust sensibly?

While the basic idea underlying value-added models is the same, there are in fact many models. Do different models applied to the same data sets produce the same results? Are value-added models “robust”?

Since models are applied to longitudinal data sequentially, it is essential to ask whether the results are consistent year to year. Are the computed teacher effects comparable over successive years for individual teachers? Are value-added models “consistent”?

A lot of these concerns have been independently voiced by Mark P. However, what is especially concerning is the idea that we could iterate through these assumptions to find a school ranking that satisfies some prior. This can be good under some circumstances -- Thomas Lumley gives an example of a model that clearly mixed up rankings of some kind of sports team (this isn't my area of expertise so I apologize that I don't recognize the teams or sport involved). But it does show how difficult these models are, even with the best faith involved. Still, in the case of Dr. Lumley's example there is a universal outcome that has broad agreement (does this team win games) that is being predicted. In education we lack this very clean outcome which is where it gets tricky -- in a sense we are modeling a latent variable (student outcomes).

All of this suggests that we should be cautious about these models and perhaps this would be an appropriate time to put some serious effort into student outcomes ascertainment so that it will be easier to calibrate these statistical models (making the outcome the test score seems clever but merely hides the problem rather than solving it unless we are confident that the score is a very good measure of outcomes).

Wednesday, July 31, 2013

Stories I should probably be writing about

Thoreau has a glowing review of this book on pedagogical fads. It looks interesting though given the cost and the quantity I assume they are writing them out by hand.

The Bennett scandal continues to reverberate both in Indiana and Florida.

Democracy Prep and its founder and superintendent Seth Andrew have been media darlings but I'm seeing some things that trouble me, both about the school and the founder, particularly when you read between the lines.

Kevin Drum addresses our old friend, peer effects.

Take a look at this school designed largely to give master's degrees to people who came through Teach for America and similar programs. As you might expect, I see some connections between this and the previously mentioned issue of grooming TFAers for leadership positions.

Which segues nicely into lapsed TFAer Gary Rubinstein's blog and its informative series of posts on a recent visit to a KIPP school in New York. He also often addresses the previously mentioned TFA cultural issues that concern me and many of his other comments (like this one) track with my experience almost perfectly.

I'm a big fan of Kaiser Fung, but the Moneyball analogy strikes me as a bad framework for an analytic approach to education reform, bad enough to cause real damage. (I see from the queue that Joseph is planning to address some of these issues tomorrow)

And, to take a break from the education beat, there's this post from Naked Capitalism on a shadow credit reporting system.

p.s. Should have included a link to this Washington Post interview with "the world’s most famous teacher." It nicely lays out the tension between some of our best veteran teachers (in this case one of the major models for KIPP) and the education reform movement.

General versus particular cases

Andrew Gelman did a very interesting article in Slate on how being overly reliant on statistical significance can lead to spurious findings. The authors of the study that he was critiquing replied to his piece. Andrew's thoughts on the response are here.

The led to two thoughts. One, I am completely unimpressed by claims that a paper being in a peer-reviewed journal -- that is a screen but even good test have false positives. All this convinces me of is that the authors were thoughtful in the development of the article, not that they are immune to problems. But this is true of all papers, including mine.

Two, I think that this is a very tough area to take a single example from. The reason is that any one paper could well have followed the highest possible level of rigor, as Jessica Tracy and Alec Beall claim they have done. That doesn't necessarily mean that all studies in the class have followed these practices or that there were not filters that aided or impeded publication that might enhance the risk of a false positive.

For example, I have just finished publishing a paper where I had an unexpected finding that I wanted to replicate (that there was an association was a priori, the direction was reversed from the a priori hypothesis). I found such a study, added additional authors, added additional analysis, rewrote the paper to be a careful combination of two different cohorts, and redid the discussion. Guess what, the finding did not replicate. So then I had the special gift of publishing a null paper with a lot of authors and some potentially confusing associations. If I had just given up at that point, the question might have been hanging around until somebody else found the same thing (I often used widely available data in my research) and published it.

So I would be cautious about multiplying the p-values together for a probability of a false positive. Jessica Tracy and Alec Beall:

The chance of obtaining the same significant effect across two independent consecutive studies is .0025 (Murayama, K., Pekrun, R., & Fiedler, K. (in press). Research practices that can prevent an inflation of false-positive rates. Personality and Social Psychology Review.)

I suspect that this would only hold if the testable hypothesis was clearly stated before either study was done. It also presumes independence (it is not always obvious that this will hold as design elements of studies may influence each other) and that there isn't a confounding factor involved (that is causing both the exposure and the outcome).

Furthermore, I think as epidemiologists we need to make a decision about whether these studies are making strong causal claims or advancing a prospective association that may led to a better understanding of a disease state. We often write articles speaking in the later mode but then lapse into the former when being quoted.

So I guess I am writing a lot to say a couple of things in conclusion.

One, it is very hard to pick a specific example of a general problem when it is possible that any one example might happen to meet the standards required for the depth of inference being made. This is very hard to ascertain within the standards of the literature.

Two, the decision of what to study and what to publish are also pretty important steps in the process. These things can have a powerful influence on the direction of science in a very hard to detect manner.

So I want to thank Andrew Gelman for starting this conversation and the authors of the paper in question for acting as an example in this tough dialogue.

The other side of the ethical failures of the education reform movement

There's an old denominational joke that ends with the punchline "Just don't let them see you. They think they're the only ones up here."

As mentioned before, the culture of the education reform movement is exceptionally strong and cultural identity plays a major role in the lives of movement reformers, particularly those associated with certain institutions like TFA and KIPP. This isn't necessarily a bad thing. There are a lot smart people out there working very hard to improve education because of those cultural forces. Unfortunately, these forces can also make the movement prone to blind spots, often including the belief that they're the only ones up here.

Here's a pertinent passage from Gary Rubinstein, himself a lapsed TFAer.

The KIPP high school has a large area in the middle with a lot of tables, almost like a coffee shop. I went out to get lunch at the nearby Fairway and came back and sat at one of those tables to eat. At the table next to me I overheard a discussion between a KIPP administrator and a teacher. Most of the KIPP administrators, like this woman was, are young and white, as are most of the teachers. This teacher was black and seemed to be in her late 40s. The conference was related to some sort of recommendation letter, maybe for some academic program, that the older teacher was writing for one of her former students. I’m not sure who initiated this discussion, but the administrator was explaining that the letter should be re-written. The issue was that this teacher had been a bit too ‘honest’ in the letter and it would hurt the chances of this student getting into the program. Now I’ve written many recommendation letters, and of course you want to put the student in the best possible light, so I’m not saying that the administrator was wrong in suggesting that this teacher change the letter. I’m just writing about this since some of the things said in the discussion were revealing.

Apparently this student had a bad attitude and failed the course. The teacher had written about this so the administrator explained to this teacher that, yes, the student had failed, but that a lot of students fail that course (I think it was Geometry). Also, it was important that the teacher understand that getting a 60 in that course at the KIPP school was like getting a 90 in most other schools since, I guess she felt like she knew, the other neighborhood schools have extreme grade inflation. The conference was resolved with the teacher agreeing to rewrite the letter keeping these things in mind. I found it interesting that a lot of students fail this course since the media would have us believe that after being in KIPP from 5th grade to 11th grade, students there wouldn’t be failing that much. Also, the assumption that the ‘other’ schools have such low expectations that a 90 there is like a 60 at KIPP, I don’t know if she how she can be so confident about that claim.

This anecdote is troubling on any number of levels, not the least of which is fact that KIPP 60 = Other school 90 is highly debatable (there are a lot of open questions about how to interpret KIPP's numbers but I doubt even the most favorable reading would support the assertion that a D- at KIPP was equivalent to an A- elsewhere), but even if we stipulate to that part, we are still left with all sorts of concerns.

This is, after all, a case of an administrator in a fairly public setting pressuring a teacher to give a student a more favorable evaluation. That's a dangerous line, particularly when you take into account the fact that getting more students accepted into prestigious programs generates good press for KIPP, helps the administrator's career track and may well figure into funding.

There's nothing new about incentives that encourage teachers to lower standards (or about having administrators play the devil on the shoulder), but the reform movement has greatly raised the stakes, More importantly, they've provided a belief system that make it easier to justify cutting corners and ignore conflicts of interest. Minor lies are OK in a recommendation letter because your students are held to higher standards; mass dumping of students is OK because the better your school does the more schools will adopt your superior model; cooking the books to make your flagship school look good is OK because there must be something wrong with a metric that makes the school look bad.

Tuesday, July 30, 2013

The looting phase of education reform and the other Tony Bennett

[In response to Joseph's prod]

I realize regular readers must be getting tired of these stories (new readers can see why by searching this blog for "looting"), but it looks like we have to go over this one more time. When it comes to metric-based education reform:

1. There are numerous easy and effective ways of gaming the system;

2. There are huge financial and political incentives for gaming the system;

3. There are powerful advocates across the political spectrum (from David Brooks to Jonathan Chait and Matthew Yglesias) who can be relied upon to provide ample cover for those who game the system.

Under these circumstances, it would be shocking if we weren't seeing extensive cooking and out-and-out fraud. Still, even by the standards we've come to expect, this is really something.

From a truly impressive piece of investigative journalism by Tom LoBianco:

INDIANAPOLIS (AP) — Former Indiana and current Florida schools chief Tony Bennett built his national star by promising to hold “failing” schools accountable. But when it appeared an Indianapolis charter school run by a prominent Republican donor might receive a poor grade, Bennett’s education team frantically overhauled his signature “A-F” school grading system to improve the school’s marks.

Emails obtained by The Associated Press show Bennett and his staff scrambled last fall to ensure influential donor Christel DeHaan’s school received an “A,” despite poor test scores in algebra that initially earned it a “C.”

“They need to understand that anything less than an A for Christel House compromises all of our accountability work,” Bennett wrote in a Sept. 12 email to then-chief of staff Heather Neal, who is now Gov. Mike Pence’s chief lobbyist.

The emails, which also show Bennett discussed with staff the legality of changing just DeHaan’s grade, raise unsettling questions about the validity of a grading system that has broad implications. Indiana uses the A-F grades to determine which schools get taken over by the state and whether students seeking state-funded vouchers to attend private school need to first spend a year in public school. They also help determine how much state funding schools receive.

...

Bennett, who now is reworking Florida’s grading system as that state’s education commissioner, reviewed the emails Monday morning and denied that DeHaan’s school received special treatment. He said discovering that the charter would receive a low grade raised broader concerns with grades for other “combined” schools — those that included multiple grade levels — across the state.

“There was not a secret about this,” he said. “This wasn’t just to give Christel House an A. It was to make sure the system was right to make sure the system was face valid.”

However, the emails clearly show Bennett’s staff was intensely focused on Christel House, whose founder has given more than $2.8 million to Republicans since 1998, including $130,000 to Bennett and thousands more to state legislative leaders.

Bennett estimated that 12 or 13 schools benefited, not just Christel House, but the emails show DeHaan’s charter was the catalyst for any changes.

“The fact that anyone would say I would try to cook the books for Christel House is so wrong. It’s frankly so off base,” Bennett said in a telephone interview Monday evening.

Bennett rocketed to prominence with the help of former Indiana Gov. Mitch Daniels, former Florida Gov. Jeb Bush and a national network of Republican leaders and donors, such as DeHaan. Bennett is a co-founder of Bush’s Chiefs for Change, a group consisting mostly of Republican state school superintendents pushing school vouchers, teacher merit pay and many other policies enacted by Bennett in Indiana.

...

But trouble loomed when Indiana’s then-grading director, Jon Gubera, first alerted Bennett on Sept. 12 that the Christel House Academy had scored less than an A.

“This will be a HUGE problem for us,” Bennett wrote in a Sept. 12, 2012, email to [then-chief of staff Heather] Neal.

Neal fired back a few minutes later, “Oh, crap. We cannot release until this is resolved.”

By Sept. 13, Gubera unveiled it was a 2.9, or a “C.”

A weeklong behind-the-scenes scramble ensued among Bennett, assistant superintendent Dale Chu, Gubera, Neal and other top staff at the Indiana Department of Education. They examined ways to lift Christel House from a “C” to an “A,” including adjusting the presentation of color charts to make a high “B” look like an “A” and changing the grade just for Christel House.

It’s not clear from the emails exactly how Gubera changed the grading formula, but they do show DeHaan’s grade jumping twice.

...

Bennett said Monday he felt no special pressure to deliver an “A” for DeHaan. Instead, he argued, if he had
paid more attention to politics he would have won re-election in Indiana.

Yet Bennett wrote to staff twice in four days, directly inquiring about DeHaan’s status. Gubera broke the news after the second note that “terrible” 10th grade algebra results had “dragged down their entire school.”

...

When Bennett requested a status update Sept. 14, his staff alerted him that the new school grade, a 3.50, was painfully close to an “A.” Then-deputy chief of staff Marcie Brown wrote that the state might not be able to “legally” change the cutoff for an “A.”

“We can revise the rule,” Bennett responded.

Over the next week, his top staff worked arduously to get Christel House its “A.” By Sept. 21, Christel House had jumped to a 3.75. Gubera resigned shortly afterward.

This is a big story for a number of reasons.

There's the scale of the thing.

There's the funding aspect; assuming something of a zero-sum arrangement, some schools had to be cheated out of some of the money that was coming to them.

There's the seemingly complete lack of integrity on the part of the Indiana Department of Education. Pressure to change a grading formula is one of the most common ethical challenges educators face. We all know the right thing to do in this situation, but it appears from the emails that no one in power seriously tried to hold the ethical line.

There's Bennett's position in the reform movement. Under his watch, Florida is pushing one of the most extreme reform agendas. Perhaps more troubling, even before the Indiana revelations came out, the Florida Department of Education had already been accused of cooking charter school results since he arrived.

Monday, July 29, 2013

Paging Mark P

Mark Thoma posted this today.

I think the whole idea of charter schools has some merit as a means of educational experimentation. But if this sort of cheating occurs, it makes it impossible to trust the data and that removes most of the benefit of being able to "let a thousand flowers bloom in the hopes that one will be especially amazing". It could be an isolated incident, but even a single case this egregious makes it much harder to trust that education reform is adhering to strict metrics.

[My response is here -- Mark P]

The obliviousness to the insularity

David Weigel is an excellent journalist and I'm a great admirer of his political reporting, but he is also very much part of the culture of the journalistic elite and that means he is not immune to some of the issues we've been discussing:

Sure, we're divorced from the rest of America. Lots of Americans who don't live here manage to have opinions about Washington. You don't see us going into Beast Mode about it. But I see Payne's point, and wish he had more of an argument than this:

Romney’s native Bloomfield Hills is a long way from Detroit, though it may not look that way from the beltway.

Yeah, it's in the suburbs. I defer to Payne, of course, but is it so strange to hear people who live on the outskirts of a city, who root for its sports teams and fly out of its airport, to identify with the city? You won't find many people in Skokie, Illinois or Downey, California distancing themselves from Chicago and Los Angeles. Eminem has been the star of Chrysler ads, asking Americans to stand up for Detroit, but the guy lives in Rochester now, not far from Bloomfield Hills. Anyway, that wasn't my point—emergency managers and top bureaucrats often arrive in struggling cities from somewhere else entirely. Stephen Goldsmith's eventual reward for a successful career in Indianapolis was a job in Mike Bloomberg's city hall, though this ended poorly for reasons orthogonal to policy.

Weigel provides here one of those perfect passages where a writer disputes an argument then unintentionally proves it a few words later. Unless they are discussing something involving governance, even if they actually use the word 'city,' when people say "Los Angeles" they normally are referring to Los Angeles COUNTY.

It's worth noting that the LA section of the iconic "I Love LA" starts:

Rollin' down the Imperial Highway
With a big nasty redhead at my side
Santa Ana winds blowin' hot from the north
And we as born to ride

Imperial connects the east and west sides of the county but it only briefly crosses through the city proper.

LA County Incorporated Areas Los Angeles highlighted

LA is a weird patchwork of cities and towns that often confuse even natives ("Is that a town or a neighborhood?"). What the natives do keep track of is counties. If you live in Downey, you're an Angeleno. If you live in Fullerton, you're behind the Orange Curtain.

As mentioned before, LA's distinctive type of sprawl is very different from the concentric urban, suburban, exurban dynamic of a city like Atlanta. As far as I can tell, there is no LA analogue to Alpharetta. The result is that lessons learned in one city often don't generalize to this area. Of course, you can make a similar point about Atlanta and Chicago, a city of any number of unique historic and cultural attributes, or, while we're at it, Detroit. And this brings us back, inevitably, to the dangers of an insular political/journalistic class.

A greatly disproportionate amount of news and policy is shaped by a surprisingly small number of people with similar backgrounds and overlapping social circles living in relatively close vicinity. Under these circumstances, it is almost unavoidable that the natural tendency to see other people's lives as less complex will grow into a group-belief that people who live in the rest of the country have simpler lives with largely interchangeable problems.

And what about Weigel's charge that this cuts both ways? Aren't people in Little Rock, Arkansas as ill-informed and yet as opinionated about DC and NYC as people in those cities are about Little Rock? To put it bluntly, no and no. For the first point, since so much of the press is NYC and DC-centric, much of what what normally be classified as local interest there gets national coverage. Growing up in a small Southern town, I regularly read a number of publications from those cities including the NYT, the Washington Post, the New Yorker and New York Magazine (mainly for the critics -- I was a big fan of Denby and I found Simon generally but interestingly wrong -- but often cover to cover). Of course, this didn't make me all that knowledgeable about these cities but, and this brings me to the second point, it gave me some sense of what I did and didn't know.

The concern here is not just the insularity; it's the obliviousness to the insularity, not being aware of your own provincialism. When a group of high school seniors from a small Delta town sit around discussing what life might be like in a NYC or LA, they are acutely aware of how limited and inapplicable their experience is; they deal in known unknowns.

With the current journalistic and political establishment, though, we have seen accumulating evidence of unknown unknowns, of people like Weigel who are not only drawing conclusions too far outside of their expertise but who are entirely unaware of how thin the ice has gotten. This is a trivial example -- Weigel's grasp of LA's idiosyncrasies is not in and of itself that big of a deal -- but it is worth noting partially because Weigel is one of the best journalists working that beat but mainly because many examples of this phenomenon are not trivial at all.

Weigel was, after all, using Downey to support the idea that cities (implicitly, I think, in "the rest of the country") are similar enough that a statement that holds for the others should hold for Detroit. Detroit, a unique and enormously complex city historically, economically, culturally and politically.

And we have more troubling cases. The New York Times published a major series on urban policy that backed up one of its major arguments by treating Harris County in Texas as analogous to Westchester County in New York. When McDonald's issued a suggested budget for its workers, the criticism from largely upper middle class, Northeastern journalists (who live in a world of easy employment and costly housing) was mostly directed at a fairly realistic rent estimate while the almost impossible requirement that the worker find a second part time job was often ignored. And those are just a couple of examples that happened to wash up on this blog; you can easily find a dozen more in a typical day's papers.

As mentioned earlier, greatly disproportionate amount of news is shaped by a surprisingly small number of people. They decide what stories are important and how they should be framed. That has always been the case, but in many ways, this uniformity has gotten worse recently and it's lead to serious problems with group think and a dangerous arrogance. The less they know the more confident they become.

Term of the day -- "Macro Tourists"

I came across this pejorative in this very sharp Barry Ritholtz post then followed a link to get to this definition from a very funny piece by Joe Weisenthal:

"Macro Tourists" is a phrase that was coined by former policy economist and global macro trader Mark Dow to characterize investors who left their comfort zone to make big pronouncements about how macroeconomics really works.

It might be interesting to come up with analogues in other fields. Perhaps Joseph has some examples from epidemiology.

Gold-bugs

Greg Mankiw has a post on investing in gold. I think I actually agree that gold investing has a bad name; it's the people who suggest putting all of your wealth into it that scare me the most. Any single commodity as an asset class is a bad investment. Gold tends to be volatile in price, which isn't ideal either.

But I also think there can be one more good reason to hold gold -- you like collecting it. As collectables go, I would be more confident in a gold coin collection holding value than a baseball card or comic book collection. I could be wrong on this point, especially for specific items. But it's not an insane hobby to have. I am actually more sympathetic to it than I am to junk silver, where the collections tend to not have the aesthetic appeal of gold and the circumstances under which the silver would make a good store of wealth seem more limited.

So it's not an insane idea to invest in gold as a collectable or as a part of a portfolio. But it is an odd stance to take as a principal means of savings. But that doesn't seem to make it ever go completely out of fashion.

Saturday, July 27, 2013

Weekend blogging -- an analytic approach to comedy

[The relevant gag occurs around 1:25 starting with "no scruples"]

Veteran show-runner Ken Levine discusses in depth one of the oldest but most durable classes of jokes, the Comedy Rule-of-Threes.Set up a pattern with two items in a list then deviate from the pattern with the third but not so sharply as to violate the premise.

Comedy writer Bob Ellison was in a late night rewrite once and pitched a joke. The showrunner said, “Too corny, too obvious” and Bob replied, tapping his wristwatch, “Two thirty.”

Comedy is a performance genre of instant feedback. It's also a field that attracts sharp minds. Therefore, it's not surprising that most performers have well-thought-out theories about why things are funny. For someone like me, who has watched waaaay too much TV, it's interesting seeing how the tricks are done.

Friday, July 26, 2013

When news stories (fail to) collide

A thought on Joseph's recent thought on Netflix and on the way journalists often fail to notice the implications of one part of a story on another.

Here's a statement from a press conference earlier this week (which apparently came with enough glitches of its own to cause a two hour delay):

“We’re fundamentally in the membership happiness business as opposed to the TV business,” is the way CEO Reed Hastings described his view in Netflix’s first video conference call for analysts. CNBC’s Julia Boorstin and BTIG analyst Rich Greenfield pitched the questions, on a Google Hangout, synthesizing contributions from analysts. And Chief Content Officer Ted Sarandos didn’t flinch when Greenfield specifically asked about movies, news, and talk shows. There’s “no reason” why Netflix wouldn’t expand into those areas, he says. Hastings added that “HBO and Showtime do sports.”

And here are some excerpts from the story Joseph cites which came out Monday.

LOS GATOS, Calif. (AP) — Netflix’s Internet video subscription service works around the clock, but it’s unusual for more than two dozen of the company’s engineers and top managers to be huddled in a conference room at 10:30 on a midsummer Wednesday evening.This is a special occasion. It’s near the end of a grueling day that will culminate in the premiere of “Orange Is The New Black,” the fourth exclusive Netflix series to be released in five months. The show’s first episode is called “I Wasn’t Ready,” and everyone in the room has been logging long hours to ensure that the title doesn’t apply to the debut.

Netflix Inc. invited The Associated Press to its Los Gatos, Calif., headquarters for an unprecedented glimpse at the technical preparations that go into the release of its original programming. The shows have become the foundation of Netflix’s push to build an Internet counterpart to HBO’s premium cable channel.
...
On this night, the setting has been transformed into Netflix’s version of a war room. The engineers are flanked by seven flat-screen televisions on one side of the room and two giant screens on the other. One big screen is scrolling through Twitter to highlight tweets mentioning “Orange Is The New Black,” an offbeat drama set in a women’s prison. The other screen is listing some of Netflix’s most closely guarded information — the rankings of videos that are attracting the most viewers on an hourly basis.
...
“This will be a successful night if we are here at midnight and it turns out that we really didn’t need to be because there were no problems,” says Yury Izrailevsky, Netflix’s vice president of cloud computing and platform engineering. The mission is to ensure each installment of “Orange Is The New Black” has been properly coded so the series can be watched on any of the 800 Internet-connected devices compatible with Netflix’s service. It’s a complex task because Netflix has to account for viewers who have different Internet connection speeds, various screen sizes and different technologies running the devices. About 120 variations of code have been programmed into “Orange Is The New Black” to prepare it to be streamed on Netflix throughout the U.S and 39 other countries. Another set of engineers had to ensure foreign-language subtitles and dubbing were in place and streaming properly.

Others are still checking to make certain that the English dialogue properly syncs with the video being shown at different Internet connection speeds. Just before another Netflix series, “House of Cards,” debuted in February, engineers detected two minutes of dialogue that was out of sync with video played on iPhones at certain speeds, prompting a mad scramble to fix the problem before the series was released to subscribers.

At the risk of belaboring the obvious, news and talk are highly topical and require quick turnaround. Sports is even worse, being live and requiring your system to handle huge spikes in traffic. It certainly sounds like these genres would require major upgrades for Netlflix.

In cases like this it's always useful to think about the McDonald's breakfast example and ask "is this a product that can be sold at a profit using existing resources?" For original series and movies, the answer is probably yes. I'm not sure Hastings and Sarandos are up to it, but I'm confident that it could be done (and I would love to see Netflix do it). For these other genres, the answer is probably no, which means that, unless you have exceptionally deep pockets (perhaps even Amazonian pockets) or an open field, you should proceed with caution.

Netflix thought of the day

Netflix is definitely suffering growing pains with the new content. Generally speaking, my experience with the first two series is that they are quite well done, albeit with some definite weak points. But it will take a lot of effort to step up this effort from "expensive advertising" to "serious content generation". Approaches like the war room make it seem less likely that the approach is scalable.