Monday, March 31, 2014

Perhaps we should add "opaque" to the list of journalists' vocabulary questions

Last week, Andrew Gelman criticized Todd Balf for picking words and phrases for their emotional connotation rather than for their actual meaning in his New York Times Magazine article on the changes in the SAT. 'Jeffersonian' was the specific term that Gelman choked on. I'd add 'opaque' to the list though the blame here mainly goes to David Coleman, president of the College Board and quite possibly the most powerful figure in the education reform movement:
For the College Board to be a great institution, [Coleman] thought at the time, it had to own up to its vulnerabilities. ... “It is a problem that it’s opaque to students what’s on the exam."
There's a double irony here. First because Coleman has been a long-standing champion of some very opaque processes, notably including those involving standardized tests, and second because test makers who routinely publish their old tests and who try to keep those tests as consistent as possible from year to year are, by definition, being transparent.

This leads to yet another irony: though the contents of the tests are readily available, almost none of the countless articles on the SAT specifically mention anything on the test. The one exception I can think of is the recent piece by Jennifer Finney Boylan, and it's worth noting that the specific topic she mentioned isn't actually on the test.

Being just a lowly blogger, I am allowed a little leeway with journalistic standards, so I'm going to break with tradition and talk about what's actually on the math section of the SAT.

Before we get to the questions, I want to make a quick point about geometry on the SAT. I've heard people argue that high school geometry is a prerequisite for the SAT. I don't buy that. Taking the course certainly doesn't hurt, but the kind of questions you'll see on the exam are based on very basic geometry concepts which students should have encountered before they got to high school. With one or two extremely intuitive exceptions, all the formulas you need for the test are given in a small box at the top of the first page.

As you are going through these questions, keep in mind that you don't have to score all that high. 75% is a good score. 90% is a great one.


You'll hear a lot about trick questions on the SAT. Most of this comes from the test's deliberate avoidance of straightforward algorithm questions. Algorithm mastery is always merely an intermediary step -- we care about it only because it's often a necessary step in problem solving (and as George PĆ³lya observed, if you understand the problem you can always find someone to do the math) -- but when students are used to being told to factor this and simplify that, being instead asked to solve a problem, even when the algorithms involved are very simple, can seem tricky and even unfair.

There are some other aspects of the test that contribute to the reputation for trickiness:

Questions are written to be read in their entirety. One common form breaks the question into two parts where the first part uses a variable in an equation and the second asks the value of a term based on that variable. It's a simple change but it does a good job distinguishing those who understand the problem from those who are merely doing Pavlovian mathematics where the stimulus is a word or symbol and the response is the corresponding algorithm;


Word problems are also extensively used. Sometimes the two-part form mentioned above is stated as a word problem;


One technique that very probably would strike most people as 'tricky' actually serves to increase the fairness of the test, the use of newly-minted notation. In the example below, use of standard function notation would give an unfair advantage to students who had taken more advanced math courses.
One thing that jumps out when us math types is how simple the algebraic concepts used are. The only polynomial factoring you are ever likely to see on the SAT is the difference between two squares.


A basic understanding of the properties of real numbers is required to answer many of the problems.



A good grasp of exponents will also be required for a perfect score.





There will be a few problems in basic statistics and probability:










I've thrown in a few more to make it a more representative sample.










We can and should have lots of discussions about the particulars here -- I'm definitely planning a post on Pavlovian mathematics (simple stimulus/algorithmic response) -- but for now I just want to squeeze in one quick point:

Whatever the SAT's faults may be, opaqueness is not among them. Unlike most of the instruments used in our metric-crazed education system, both this test and the process that generates it are highly transparent. That's a standard that we ought to start extending to other tests as well.

Saturday, March 29, 2014

Weekend blogging -- due to cuts in arts programs, school orchestras have been forced to adopt extreme cost-cutting measures

When you get past the novelty, the musicianship is even more impressive.



The novelty is, of course, what initially drives the clicks but it ages quickly when played badly. (From Spike Jones to the Austin Lounge Lizards, successful comic music acts tend to require solid musicians.)

I'd go further in this case. For me, they move entirely past the joke. Their AC/DC covers are so driving and percussive you soon stop wondering why these cellos are playing heavy metal and start wondering why don't all heavy metal acts use cellos.




A musician friend, who had only seen "Every Teardrop," asked if they always played just one cello. I told him no, someone would lose a finger.







These guys aren't the first classical musicians to display a talent for popular music. Yo-Yo Ma comes to mind and, believe it or not, Liberace started out as both a very promising concert pianist and a first rate Boogie-woogie piano player, but I can't recall any performers who moved so smoothly back and forth.







Friday, March 28, 2014

Fiscal prudence (a never ending saga)

This is an important point about financial planning from Megan McArdle:
When people end up in financial trouble, you often hear tsk-tsking about premium cable and fancy vacations. But if you talk to bankruptcy lawyers and financial counselors, that isn't the normal story you hear. You're more likely to hear about car loans, mortgages, alimony. In other words, it's not the luxury splurges that do you in -- it's the fixed expenses. That's because discretionary luxury expenses can be cut in an emergency, while the fixed payments go on and on until they empty your bank account.
She might be a libertarian in her politics, but she is a lot like a Canadian in terms of fiscal prudence.  Now I agree that there may be larger social issues that are making it harder for people to meet fixed expenses (e.g. wage stagnation) but at an individual level this is a calculation well worth making. 

Thursday, March 27, 2014

On SAT changes, The New York Times gets the effect right but the direction wrong

That was quick.

Almost immediately after posting this piece on the elimination of the SAT's correction for guessing (The SAT and the penalty for NOT guessing), I came across this from Todd Balf in the New York Times Magazine.
Students were docked one-quarter point for every multiple-choice question they got wrong, requiring a time-consuming risk analysis to determine which questions to answer and which to leave blank. 
I went through this in some detail in the previous post but for a second opinion (and a more concise one), here's Wikipedia:
The questions are weighted equally. For each correct answer, one raw point is added. For each incorrect answer one-fourth of a point is deducted. No points are deducted for incorrect math grid-in questions. This ensures that a student's mathematically expected gain from guessing is zero. The final score is derived from the raw score; the precise conversion chart varies between test administrations.

The SAT therefore recommends only making educated guesses, that is, when the test taker can eliminate at least one answer he or she thinks is wrong. Without eliminating any answers one's probability of answering correctly is 20%. Eliminating one wrong answer increases this probability to 25% (and the expected gain to 1/16 of a point); two, a 33.3% probability (1/6 of a point); and three, a 50% probability (3/8 of a point). 
You could go even further. You don't actually have to eliminate a wrong answer to make guessing a good strategy. If you have any information about the relative likelihood of the options, guessing will have positive expected value.

The result is that, while time management for a test like the SAT can be complicated, the rule for guessing is embarrassingly simple: give your best guess for questions you read; don't waste time guessing on questions that you didn't have time to read.

The risk analysis actually becomes much more complicated when you take away the penalty for guessing. On the ACT (or the new SAT), there is a positive expected value associated with blind guessing and that value is large enough to cause trouble. Under severe time constraints (a fairly common occurrence with these tests), the minute it would take you to attempt a problem, even if you get it right, would be better spent filling in bubbles for questions you haven't read.

Putting aside what this does to the validity of the test, trying to decide when to start guessing is a real and needless distraction for test takers. In other words, just to put far too fine a point on it, the claim about the effects of the correction for guessing aren't just wrong; they are the opposite of right. The old system didn't  require time-consuming risk analysis but the new one does.

As I said in the previous post, this represents a fairly small aspect of the changes in the SAT (loss of orthogonality being a much bigger concern). Furthermore, the SAT represents a fairly small and perhaps even relatively benign part of the story of David Coleman's education reform initiatives. Nonetheless, this one shouldn't be that difficult to get right, particularly for a publication with the reputation of the New York Times.

Of course, given that this is the second recent high-profile piece from the paper to take an anti-SAT slant, it's possible certain claims weren't vetted as well as others.

Wednesday, March 26, 2014

The SAT and the penalty for NOT guessing

Last week we had a post on why David Coleman's announcement that the SAT would now feature more "real world" problems was bad news, probably leading to worse questions and almost certainly hurting the test's orthogonality with respect to GPA and other transcript-based variables. Now let's take a at the elimination of the so-called penalty for guessing.

The SAT never had a penalty for guessing, not in the sense that guessing lowed your expected score. What the SAT did have was a correction for guessing. On a multiple-choice test without the correction (which is to say, pretty much all tests except the SAT), blindly guessing on the questions you didn't get a chance to look at will tend to raise your score. Let's say, for example, two students took a five-option test where they knew the answers to the first fifty questions and had no clue what the second fifty were asking (assume they were in Sanskrit). If Student 1 left the Sanskrit questions blank, he or she would get fifty point on the test. If Student 2 answered 'B' to all the Sanskrit questions, he or she would probably get around sixty points.

From an analytic standpoint, that's a big concern. We want to rank the students based on their knowledge of the material but here we have two students with the same mastery of the material but with a ten-point difference in scores. Worse yet, let's say we have a third student who knows a bit of Sanskrit and manages to answer five of those questions, leaving the rest blank thus making fifty-five points. Student 3 knows the material better than Student 2 but Student 2 makes a higher score. That's pretty much the worst possible case scenario for a test.

Now let's say that we subtracted a fraction of a point for each wrong answer -- 1/4 in this case, 1/(number of options - 1) in general -- but not for a blank. Now Student 1 and Student 2 both have fifty points while Student 3 still has fifty-five. The lark's on the wing, the snail's on the thorn, the statistician has rank/ordered the population and all's right with the world.

[Note that these scales are set to balance out for blind guessing. Students making informed guesses ("I know it can't be 'E'") will still come out ahead of those leaving a question blank. This too is as it should be.]

You can't really say that Student 2 has been penalized for guessing since the outcome for guessing is, on average, the same as the outcome for not guessing. It would be more accurate to say that 1 and 3 were originally penalized for NOT guessing.

Compared to some of the other issues we've discussed regarding the SAT, this one is fairly small, but it does illustrate a couple of important points about the test. First, the SAT is a carefully designed tests and second, some of the recent changes aren't nearly so well thought out.

Why I am optimistic about 538

As people may or may not know, Nate Silver has launched an independent website.  Some of the people whom I respect the most on the internet (Noah Smith, Paul Krugman, Andrew Gelman) have pointed out some of the teething problems, where the inclusion of either more data or more model information in the article would have been helpful. 

In essence, I think that the website is trying to balance a number of things at the same time:
  1. Use of predictive statistical models
  2. Accessible journalism
  3. Thought provoking/contrarian views
  4. A diverse body of topics
All of these elements can be important, but there can be a steep learning curve as to where the value is to the news consumer.  For example, Andrew Gelman points out in the sports column (as I best understand it -- I know nothing about sports and I am going entirely on the model comments) that he is having trouble figuring out the underlying model which makes interpretation more complicated.   In the comments, there was a request for the correlation matrix, which is deadly reasonable in the statistics field but might not appeal to the median reader.

So why am I optimistic?  Because Nate Silver has tended to be very data driven in his endeavours.  I have a strong prior expectation that the initial offerings are, at least partially, a test to see where he can add value relative to other media services (both on and off of the web).  Under these conditions, a good empirical tester would deliberating try out approaches and opinions that will likely fail.  Because that is the only way to get actual data on what works and to find unexploited niches. 

If people constantly want more statistics and articles with well described models (or links to well described models) do well then I bet we will see a lot more of them.  Or at least I hope so. 

So I am going to wait for about 90 days and then see what the site looks like.  I could be wrong about this approach -- but I am willing to put my opinion out there and see if the data support it going forward. 

Tuesday, March 25, 2014

A tentative foray into e-publishing

Regulars may have noticed that the blog went a bit fallow in late April and early March, though Joseph (who is disgustingly hardworking) picked up a great deal of the slack. My time was being diverted into putting together a couple of small books of puzzles from the Thirties and Forties and, sometimes more dauntingly, learning the subtleties of Kindle publishing.

The titles are "Classic Puzzles for the Classroom" and "Classic Word Puzzles for the Classroom." Other than some pagination and layout issues (more on that later in the post), the results were fairly close to what I had in mind. I believe they meet Abraham Lincoln's famous standard of literary acceptability: people who like this sort of thing will find in this the sort of thing they like.





Both books are collections of puzzles and games from Golden Age comics, selected from books now in the public domain and arranged in teacher-friendly sections. The target audience is small but the material was a good fit with the ongoing math ed and mathematical recreation threads here and at You Do the Math (which is about to go active again). I'll come back to the actual puzzles in future posts. For now though, here are a few notes on my (very limited) experience with e-books.

I like old comic books to look like old comic books, but not too much. Since I was using publicly available scans of very old magazines, some retouching was necessary but I tried to make it as unobtrusive as possible. I used GIMP for individual touch-ups and ImageMagick for things like rescaling large numbers of pictures. I'm no expert on graphics (more of a video guy) but the learning curve wasn't bad at all.

I had initially planned on doing the books as PDFs but Amazon's instructions said that would cause formatting problems and suggested submitting Word documents instead so that's what I did. I'm not sure it helped. Based on my experience and what I've read since, Kindle e-books are not a graphics-friendly format and, unfortunately, I was doing a couple of picture book. Formatting and pagination changed from device to device and, in the case of the Kindle preview function, changed while viewing the document on the same device -- as I flipped back and forth through the preview, a picture that started out on page nine might be on page ten when I flipped back. I tried playing with formatting and inserting a break for every new page but I eventually accepted defeat and simply left the page numbers out of the index with an explanatory note.

Recently, I came across a tool called Kindle Comic Creator, which I will try if I do another graphics-heavy e-book.

The rest of the publishing process was remarkably easy. The online form is fairly short and if I hadn't had to keep uploading reformatted drafts the process would probably have taken an hour or two.

I'll open the floor for suggestions now. Does anyone out there have relevant e-publishing experience to share?

Monday, March 24, 2014

This is also true in Epidemiology

Frances Woolley is back with a great post on how junior people focus on the statistical models and not the data set itself.  This is unfortunate as domain-specific knowledge of the data and the expected relations in the data is often the most important contributions.  When I worry about "field-jumping", it is this sort of problem that jumps up:
But all else is not equal. Using probit will not save a regression that combines men and women together into one sample when estimating the impact of having young children on the probability of being employed, and fails to include a gender*children interaction term. (The problem here is that children are associated with a higher probability of being employed for men, and a lower probability of being employed for women. These two effects cancel out in a sample that includes both men and women.)
Here we have a well understood and theoretically clear interaction that could easily be missed if one was not aware the body of work under-pinning it. 

It's also why I am suspicious of simplistic explanations for why entire fields have missed the obvious confounder/true exposure.  It is possible that this is true, but a command of the literature is needed to really understand why such a blind spot developed.  Which is not to say outsiders never bring in value (the Emperor has no clothes effect really exists).  But that I am much happier when I see a very detailed command of the data being used, the questions that were asked, the population that was included, ways in which the data collection may have influenced the results, and so forth.

Definitely go and read.

SAT winners and losers

One thing I've noticed about the recent calls to end the SAT is that the test is framed entirely as an obstacle. At no point is there any suggestion that some students might have more educational opportunity because of the test. Obviously that can't be true. There is clearly a zero sum aspect to this. When someone bombs their SAT, Harvard does not reduce its admissions by one.

This pool of those likely to gain is quite large. Having gone to a perfectly good but not outstanding public high school in the middle of the country, I can tell you that the best and often the only feasible way for most students to catch the eye of an elite college recruiter (with the possible exception of athletic accomplishment) is through high SAT scores. It is possible for a valedictorian from a no-name high school to get in to an Ivy League school without killer test scores, but they won't be pursued the way students who have broken 700 across the board on the SAT will. For a a lot of middle-class students, the SAT and ACT represent their best chances at a really prestigious school, not to mention the scholarships most Americans need to attend those schools.

This suggest an interesting framework for looking at the likely winners and losers under the current SAT system. Let's define winners as those for whom the potential benefits of a very high score are larger than the potential downside from an average or below score and losers as the opposite.

What would these two groups tend to look like? We have already partially answered this question for the winners. They would come from no name public high schools. They would tend not live near a major academic center such as the Northeast or Central or Southern California (since proximity increases the chance of networking). They would be middle or lower income (or at least low enough for twenty or thirty thousand of annual tuition to be a significant hardship).

How about the losers by the standard? Remember these are people who would gain relatively little from a very high score. That rules out anyone not fairly well to do (most of the rest of us can really use a full ride scholarship). They would probably attend the kind of elite and very pricey prep schools that are expert at getting their students into top universities. They would have the support network of connections and role models that make the application process go much smoother.

We previously discussed the op-ed by Jennifer Finney Boylan. Boylan clearly saw herself as someone who was more likely to be hurt than helped by the SAT so it would be interesting to see how well her background matches the group above. A quick stop at Wikipedia reveals that, though Boylan has overcome many challenges in her life, academic hardship does not appear to have been one of them. At the time her anecdote took place, she was about to graduate from the Haverford School. Haverford is almost a living cliche of an elite prep school, one of those places where the rich and powerful graduate from and send their children to.

It's true that there are ways that people with money can gain an advantage on the SAT. There are, however, considerably more and more effective ways that people with money and position can gain an advantage in all of the other factors used to rank potential college applicants: grades; school standing; extracurricular activities; recommendations; connections; the daunting application process. Students in Boylan's position have massive advantages. You could make the case that, as a high school student competing for a spot in an prestigious university, the only time Boylan had to compete on a roughly even playing field was when she took the SAT and it is worth noting that she resents it to this day.

That said, I don't want to single Boylan out. My concern here is with the insularity of the elites in our society and with the way that certain media outlets, particularly the New York Times, have come to view the world from their vantage point.

Saturday, March 22, 2014

Weekend blogging -- What kind of urban culture attracts the creative class? (answered in comic strip form)

About a month ago, we had an interesting discussion here and on Andrew Gelman's blog regarding Richard Florida's theories about the creative class and urban culture (see here, here and here). It got me thinking about one of Florida's favorite examples, Austin, Texas. These days when people think of the culture of that town, the first name that generally comes to mind is South by Southwest, but it's important to note that SxSW came after Dell.

If you were to have asked people in 1980 (shortly before the town started becoming a tech center) about Austin's culture, I suspect the answers would have focused on two main topics: the first would be the outlaw country scene (contrary to the song, Waylon and Willie actually hung out in Austin. Nobody hung out in Luckenbach); the second would be the then dominant effect of the massive UT campus on the town.

You can get a pretty good idea what people thought of that UT/frat dominated culture, from the Academia Waltz, Berkeley Breathed's first cartoon and something of a proto-Bloom County.
















Friday, March 21, 2014

Question of the day

Roger Farmer:

Why is this a big deal? Because 90% of the macro seminars I attend, at conferences and universities around the world,  still assume that the labor market is an auction where anyone can work as many hours as they want at the going wage.  Why do we let our students keep doing this?
A model is a tool for better understanding the world.  While there may be problems where this particular simplification allows complex estimation, when labor markets (e.g. unemployment) is a major target of inference this simplification seems to remove the most interesting variation (e.g. employment friction and how it makes fast job changes undesirable all around). 

Clearly, if this is the state of the art, these models could be improved (heck, even an employment change penalty function would do wonders). 

Sometimes, the SAT you read about in the news doesn't look much like the actual SAT

[Unless otherwise noted, 'SAT' refers to the SAT Reasoning Test]

There are real concerns about the SAT. The emphasis on vocabulary can and sometimes does create a problem with cultural bias and the test has a history of being misused, as do most psychometrics. Though it is possible to make too much of these abuses, we should remember that, like the IQ test, people have and in a more subtle fashion, continue to use tests like the SAT to make racist arguments.

But while there are valid arguments for changing, deemphasizing, or even eliminating the SAT, these are not the arguments you will see in the anti-SAT editorials in Esquire or the New York Times. Instead, we get attacks on an SAT test that doesn't actually exist (Though it quite possibly may after David Coleman is finished with his reforms).

Though it has long tried to live down the fact for various reasons, the SAT was designed to be what its original name suggests, a scholastic aptitude test. It was also designed to be largely orthogonal to GPA and other information found in high school transcripts. (you can find a more detailed discussion of this point here and here)

In order to achieve that orthogonality the SAT test to be written in such a way that students have taken more advanced classes do not have an unfair advantage. Partly for that reason, the SAT is perhaps unique among major measures of academic accomplishment in that it has almost no rote memory component other than vocabulary.* (From here on out, I am going to focus primarily on the mathematics section though most of the general comments will apply to the entire test.)

An old professor of mine, Bill Condon, once described the analytic SAT as the toughest ninth grade math test you will ever take. That's an extremely apt way of putting it. All of the mathematical concepts are either common sense or things which a ninth grader should have covered. Almost all of the rules and formulas needed for the test are printed in the front of the booklet.

The trouble with coverage of the SAT and to a slightly lesser extent the ACT is that virtually everyone whom you will find discussing it in the pages of a major newspaper or magazine has intense but old and usually highly unreliable memories of the test. Add to that the generally poor quality of the emotionally-charged education reform debate and the result is an incredibly unproductive discussion.

For a  representative example, check out this opinion piece written by Jennifer Finney Boylan for the New York Times, which puts the trauma front and center starting with the first sentence:
I WAS in trouble. The first few analogies were pretty straightforward — along the lines of “leopard is to spotted as zebra is to striped” — but now I was in the tall weeds of nuance. Kangaroo is to marsupial as the giant squid is to — I don’t know, maybe D) cephalopod? I looked up for a second at the back of the head of the girl in front of me. She had done this amazing thing with her hair, sort of like a French braid. I wondered if I could do that with my hair.

I daydreamed for a while, thinking about the architecture of braids. When I remembered that I was wasting precious time deep in the heart of the SAT, I swore quietly to myself. French braids weren’t going to get me into Wesleyan. Although, in the years since I took the test in the mid-’70s, I’ve sometimes wondered if knowing how to braid hair was actually of more practical use to me as an English major than the quadratic equation. But enough of that. Back to the analogies. Loquacious is to mordant as lachrymose is to ... uh ...

This was the moment I saw the terrible thing I had done, the SAT equivalent of the Hindenburg disaster. I’d accidentally skipped a line on my answer sheet, early in that section of the test. Every answer I’d chosen, each of those lines of graphite-filled bubbles, was off by one. I looked at the clock. Time was running out. I could see the Wesleyan campus fading before my eyes.

High school is a trauma-filled time and its humiliations and disappointments can stay fresh for decades as they obviously have here. They do not, however, often lead to objective or accurate analyses. It may well have seemed unfair at the time to be judged on knowledge of relatively obscure words, but given that vocabulary tracks fairly well with reading ability, it doesn't seem unreasonable to ask a future English major to display an understanding of words like 'loquacious,' 'mordant' or even 'cephalopod.' As for the math section, I assume from the definite article that "the quadratic equation" refers to the quadratic formula. If so, that's an interesting choice because that formula does not appear on the SAT.

The math that does appear on the SAT relies on the following:

properties of numbers;

basic algebraic manipulation;

very basic (junior high level) geometry (with relevant formulas printed on the first page);

simple probability;

reading graphs and tables;

logic and problem-solving.

All of these fall into the good-to-know category for the general population and I'd argue the last is especially valuable for English majors (bad logic makes for bad literary criticism and often bad literature).

Boylan then goes on to complain that the SAT relies too much on memorization and to argue for the superiority of high school GPA as an academic metric. As mentioned before, the rote learning component of the SAT is extraordinarily small, far smaller than the corresponding component for almost every test-based grade a student will receive in junior high and high school.

This oddly self-defeating argument "We should drop the SAT because it's too ____; instead, we should rely more on grades/other tests/whatever (which happen to be more _____ than the SAT)" also features prominently in a less personal but much less coherent piece by Esquire.com news editor Ben Collins. Collins' argument consists of a series of largely arbitrary but highly emotional associations (it's not entirely clear why he makes these connections but he certainly feels strongly about them).

The first and possibly strangest of these associations involves Google.
Google, a company that evolved from a search engine into the world’s de facto incubator for great ideas that define our future, does not look at standardized tests when they hire applicants. They don't look at whether or not an applicant went to a Holy Grail of the standardized test lottery -- an Ivy League school -- either.
The wording here is somewhat unclear (this almost sounds like the company redacts the education sections from applicants' resumes), but I know that the big players like Google are very interested in students coming of top computer science programs like the UC schools and particularly Stanford. Take a look at this SAT breakdown for the school that produced Google:


Score
Percent of Applicants
Admit Rate
Percent of Admitted Class
800
15%
10%
25%
700–799
45%
7%
54%
600–699
28%
4%
19%
Below 600
12%
1%
2%




Keep in mind that these numbers include humanities majors.  The graduates that a company like Google are interested would almost all be in the first two bins. Google doesn't talk much about SATs at least in part because they've largely maxed out the metric.

Collins' piece actually gets worse from there.

All of that is antithetical to the dog-eat-dog, score-high-at-all-costs test-taking culture that America has distilled in its young people. And all of that is exactly why Google is the most futuristic corporation on this planet.

They know that this kind of ingenuity and collaboration — not just knowledge — is what makes a smarter world. It is also what makes better people.

We have ritually and ceaselessly sucked the fun and wonder out of learning in a country that is pushing kids into adulthood aimless, goalless, robotic and depressed as a way to feed a system that we now know does not work.

Then we blame the adults for questioning the intent of that system, even when there is none.

Do not mistake a less-tested America for an Everybody Wins America — an academic extension of those soccer games where nobody keeps score. We need to keep score to stay competitive, to remark on ingenuity and encourage drive, to understand where help is needed and where greatness needs to be challenged further.

But we don’t need to do it in this increasingly antiquated, old-world way, a holdover from when we knew much less about our kids’ biology, how they learn, and how to compel them to be better.

Currently, we have our kids fill in bubbles, and if those kids fill in the bubbles wrong on a forgotten Saturday morning when they are 17, they’re cast to a lower lot for the rest of their lives.

This cannot be the American ideal.

Make no mistake: The next revolution is not another industrial one or another technological one but it will be our first educational one. America can lead the pack if it gets over its hubris, identifies and changes its faults, and unshackles itself from the tyranny of rules and routine that exist only for the sake of themselves.

Why try to play catch up with the old world when the greatest companies in the new economy are already here, in this country, creating new ways to make the world better? These companies are disregarding the rest of the world’s urge to retrofit an exponential stream of new information into a few hundred bubbles on a thin, white sheet of paper.

Only those kinds of companies are forging our future. Why don’t our kids deserve to be taught the same way?

China has better test scores across the board than the U.S. They do not have Apple or Facebook or Microsoft or Google. They do not have our ingenuity. Let’s start appreciating it, rewarding it, fighting for it. Let's start drilling a love of learning into the brains of our kids, in the place where fear and anxiety currently reside.

Where to start...

In some parts of this passage, Collins seems to be talking about some tests other than the SAT such as the PISA exams when addressing China** or VAM-based tests when discussing the effects on learning.  If this had led up to a condemnation of tests in general the conflation might be at least internally consistent, but with the paragraph about the importance of keeping score that possibility goes out the window. We have to limit his criticisms, however odd, to the SAT.

The only real specific Collins offers about why the SAT should be singled out for elimination is that the test is old. That's partially true. The test has constantly evolved, driven by some of the best and most sophisticated analytic techniques in the field, but in terms of the test's format and its role in the education system, we've had the current set-up since 1930.

I'd argue that the country has had a pretty good run of innovation since 1930 and while I wouldn't claim that the SAT was a major driver, it would be difficult to argue that it held us back. Collins seems to agree on the first part but he takes a strange turn from there. As best I can make it out, he's arguing that America is the most innovative country in the world so it's essential that we drop a major, longstanding component of our education system or we'll become like China.

All snark aside, we probably should have a good debate about the way college admissions work and about the (I think misplaced) emphasis we have come to put on getting into the 'right' schools. Unfortunately, writers like Boylan and Collins aren't contributing to that debate; they're just supplying misplaced anger and emotional baggage.


*There's a big question (too big to address here) about the role of vocabulary in the SAT. Ideally there should be no rote learning element here at all. The vocabulary component is supposed to measure things like how reading volume and comprehension. Memorizing lists of words is, in a sense, cheating; it's also of questionable effectiveness compared to good, active reading habits.

** From China Daily:

The first annual report on the SAT performance of Chinese students found the average score was 1,213 points out of the total of 2,400, some 296 points lower than US students and 337 points lower than the benchmark set by College Board, the organizer of the test.

The gap is mainly derived from the reading and writing parts of the test. Chinese students scored 170 points less than US students in the reading part, which reveals Chinese students lack training in critical thinking, according to the report.

Chinese students, known to excel in mathematics, earned 547 points out of the total of 800, only 30 points higher than US students. The report attributed the lower-than-expected performance to Chinese students' poor knowledge of English mathematical terms and the test is aimed at a junior level which is easier for US students.

Professional Conduct

Dean Dad has the best view on the Nazareth College decision to rescind a job offer for a philosopher:

I understand the emotional appeal of rejecting someone before she rejects you.  It’s psychologically healthy to outgrow that phase.  Yes, it’s frustrating when a candidate you’re trying to hire comes in with unrealistic requests.  But sometimes grownups have to power through the disappointment.  Here’s a phrase I’ve used in turning down unrealistic requests:

“No, sorry, I can’t do that.”
 It would have likely accomplished the same outcome, without the chilling effect on future negotiations at the college.  Colleges have a great situation for hiring in fields like philosophy and it is likely that a much better fit could easily be found. 

Thursday, March 20, 2014

Problems with modern reform paradigm

If this is correct then I see a potential problem with school (charter or not) use of standardized tests:
Parents don’t want their children’s teachers evaluated on the basis of student standardized test scores because they know it is unfair.
– Encouraged by the Obama administration, states now have teacher and principal evaluation systems that include test scores. Unfortunately, many teachers wind up being evaluated on the scores of students they don’t have or subjects they don’t teach.
Parents want to see their child’s standardized tests after completion.
– They can’t. The tests are proprietary.
Now remember that no instrument is perfect. But under what conditions does it make sense for the tests used to evaluate teachers not be made public?  What if there was an error? 

If evaluation is really the goal and we want to make data driven decisions, then are not the testing instruments themselves an important part of the environment?  Nobody would trust me to publish data from a Epidemiological study where there were not publically available instruments and public access data sets. 

Just look at the value that making NHANES public has generated.  Why should we not foster the same openness in education that has been so successful in public health? 

When threads collide -- David Coleman vs. Prof. Feynman

In all the coverage and controversy over the recent changes in the SAT, one of the aspects that troubles me the most is the one that seems to bother most people the least (emphasis added):
[David] COLEMAN: The new math section will focus on three things: Problem solving and data analysis, algebra and real world math related to science, technology and engineering fields.
The response from most journalists and pundits to this push for applicability has been either disinterest or mild approval, but if you dig into the underlying statistics and look into the history of similar educational initiatives, it's hard not to come away with the conclusion that this change pretty much has got to be bad (with a better than even chance of terrible).

The almost inevitable bad outcome will be the nearly unavoidable hit taken by orthogonality. As discussed earlier, the value of a variable (such as an SAT score) in a model lies not in how much information it brings to the model but in how much new information it brings given what the other variables in the model have already told us. Models that colleges use to assess students (perhaps with trivial exceptions) include courses taken and grades earned. We want additional variables to that model to be as uncorrelated as possible with those transcript variables. The math section of the SAT does this by basing its questions on logic, problem solving and on basic math classes that everyone should have taken before taking the SAT. Students whose math education stopped at Algebra I should be on a roughly equal footing with students who took AP calculus, as long as they understood and retained what they learned.

Rather than making the SAT a more effective instrument, "real world" problems only serve to undercut its orthogonality. Meaningful applied questions will strongly tend to favor students who have taken relevant courses. It might be possible to avoid this trap, but it would be extremely difficult and there's no apparent reason for making the change other than the vague but positive connotations of the phrase. (It's important to note here that Coleman's background is in management consulting and the ability to work positive-sounding phrases into presentations is very much a core skill in that field.)

Even more worrisome is the potential for the really bad question, bad enough to have the perverse effect of actually causing more problems (in stress and lost time) for those kids who understand the material. Nothing throws a good student off track worse than a truly stupid question.

Even if the test-makers know what they're doing, writing good, situation-appropriate problems using real situations and data is extraordinarily difficult. The vast majority of the time, real life has to be simplified to an unrealistic degree to make it suitable for a brief math problem. The end result is usually just an old problem with new nouns, take a rate problem and substitute "computer programmer" for "ditch digger."

You can make a fairly good case for real world questions based on teaching-across-the-curriculum -- for example, using Richter scale in a homework problem is a good way of working in some earth science -- but since the purpose of the SAT is to measure, not to instruct, that argument doesn't hold here.

The even bigger concern is what can happen when the authors don't know what they're doing.

From Richard Feynman's "Judging Books by their Covers":
Finally I come to a book that says, "Mathematics is used in science in many ways. We will give you an example from astronomy, which is the science of stars." I turn the page, and it says, "Red stars have a temperature of four thousand degrees, yellow stars have a temperature of five thousand degrees . . ." -- so far, so good. It continues: "Green stars have a temperature of seven thousand degrees, blue stars have a temperature of ten thousand degrees, and violet stars have a temperature of . . . (some big number)." There are no green or violet stars, but the figures for the others are roughly correct. It's vaguely right -- but already, trouble! That's the way everything was: Everything was written by somebody who didn't know what the hell he was talking about, so it was a little bit wrong, always!

Anyway, I'm happy with this book, because it's the first example of applying arithmetic to science. I'm a bit unhappy when I read about the stars' temperatures, but I'm not very unhappy because it's more or less right -- it's just an example of error. Then comes the list of problems. It says, "John and his father go out to look at the stars. John sees two blue stars and a red star. His father sees a green star, a violet star, and two yellow stars. What is the total temperature of the stars seen by John and his father?" -- and I would explode in horror.
Keep in mind, Feynman's example was picked to be amusing but representative ("That's the way everything was...a little bit wrong, always"). The post-Sputnik education reformers of his day were making pretty much the same demands that today's reformers are making. There's no reason to expect a better result this time.

Of course, there are good questions that do use real-world data (you can even find some on the SAT), but in order to write them you need a team that understands both the subtleties of the material and the statistical issues involved in testing it.

The more I hear from David Coleman, whether it concerns the College Board or Common Core, the less confidence I have in his abilities to head these initiatives.

Wednesday, March 19, 2014

Differential growth in life expectancy?

There has been a lot of discussion about Annie Lowrey's article on changes in life expectancy, documenting how most of the recent rise in life expectancy is among Americans of higher socio-economic status.  I did find the question of causality to be less compelling:

It is hard to prove causality with the available information. County-level data is the most detailed available, but it is not perfect. People move, and that is a confounding factor. McDowell’s population has dropped by more than half since the late 1970s, whereas Fairfax’s has roughly doubled. Perhaps more educated and healthier people have been relocating from places like McDowell to places like Fairfax. In that case, life expectancy would not have changed; how Americans arrange themselves geographically would have.

“These things are not nearly as clear as they seem, or as clear as epidemiologists seem to think,” said Angus Deaton, an economist at Princeton.
It is possible that there is a process of re-arrangement going on.  But that still doesn't make charts like the second one in this Aaron Carroll blog post easier to explain.  If the higher earning recipients of social security live longer than the lower earning recipients, then this association is not simple to explain with a direct appeal to the ecological fallacy. 

This is the sort of case where data is limited but we still need to make decisions.  It is odd that with some decisions we are desperately worried about getting things wrong when it advantages the affluent but we seem quite worried about over-interpreting data when redistribution would be the obvious policy solution. 

“The best thing that happened to the education system in New Orleans was Hurricane Katrina”

Joseph has already commented on one aspect of this Valerie Strauss article on Netflix CEO Reed Hastings, but a different passage caught my eye.
He appears to be presenting a vision of education in the United States where nearly all students are educated in collections of charter schools: “So what we have to do is to work with school districts to grow steadily, and the work ahead is really hard because we’re at 8% of students in California, whereas in New Orleans they’re at 90%, so we have a lot of catchup to do.”
As indicated by the Arne Duncan quote I used as a title, the notion of New Orleans as the educational ideal is strongly established in the reform movement. New Orleans has implemented the major tenets of the reform pedagogy to an extraordinary degree, particularly the rigid, metric-driven, no-excuses attitude. On this much, everyone can pretty much agree.

When we get to effects, however, the picture gets murkier. There has been some improvement in test scores but the 'reforms' coincided with increased spending which would be expected to boost scores. In addition, some of the increase can also be assigned to considerably increased pressure of students to take the tests seriously. Even putting all that aside, the improvements still don't look that impressive when compared to demographically similar schools in other states. Bruce Baker of Rutgers did the heavy lifting.

The bigger story for me, though, is in the details of the now dominant culture of New Orleans schools and in how parents and students have reacted.to the new regime. It's apparent that quite a few people are extremely unhappy.

A previous post mentioned students from one New Orleans high school walking out in a mass protest.



This was not an isolated incident.
Sci Academy, the flagship of the Collegiate Academies charter group, is known for high test scores and stringent discipline policies, such as requiring students to walk between lines taped on the floor. School leaders say the two go hand-in-hand: You don't have to walk on the right side of the hallway in college, but the discipline will serve you well.

But students at the group's two new schools, George Washington Carver Collegiate Academy and George Washington Carver Preparatory Academy, walked out the week before Thanksgiving, angry about such rules. On Wednesday (Dec. 18), about 60 students attended a rally. A letter of demands written by some students said kids were being suspended "for every little thing."

Recent state data show there are grounds for that claim. The three Collegiate schools had the city's highest suspension rates in the 2012-13 academic year. A full 69 percent of Carver Collegiate's student body was sent home at least once. Carver Prep suspended 61 percent of its student body. Sci Academy sent home 58 percent, a 9-point increase from the year before.
Anyone with experience with K-12 education can tell you that mass suspension and expulsion may possibly be the simplest and most effective way of improving test scores and making classroom management easier (a particularly pressing issue if you have high teacher turnover and rely heavily on programs like TFA). The problem with the technique is that it takes its greatest toll on the most vulnerable students. To fully grasp the brutality of these methods, you have to look at specific examples, such as this one from a parents' advocate in New Orleans:
The case that still breaks my heart involved a 14-year-old who kept getting demerits because his uniform shirt was too small and came untucked basically every time he moved. His mother was a veteran, well-educated, and had sold real estate but got divorced and ended up losing her job, and became homeless. They were living with friends and really struggling. The school expelled the child because he’d had three suspensions—the last one for selling candy to try to raise enough money to buy a new shoes and a new uniform shirt. I felt that if the mother went and told her story that the school would understand and wouldn’t hold up the expulsion. She didn’t want the school to know how impoverished she was but I convinced her to do it, so she came and told all of these people what she was going through—about her struggles. I thought for sure the board would overturn the expulsion, not just because her story was so compelling, but because there wasn’t actually anything in the school’s discipline book about selling candy. But they upheld it and it broke my heart that this kid was being put out of school because he was poor.
I don't know if this student went to one of the specific schools discussed here, but I can tell you that this is all too often what the process looks like, which is why responsible administrators use it so reluctantly.

Tuesday, March 18, 2014

A rare RPG post

Go and read Greyhawk Grognard on rare spell components.  With creativity you can make powerful spells tough to cast without needing to simply make them cost cash.  Finding each of these components would be an adventure in and of itself.  It makes components fun and interesting instead of just a "pay go" system. 

Has this person ever worked in a large corporation?

I ask in astonishment because I read things like this:
The newest bit of “wisdom” for public education comes to us from Netflix Chief Executive Officer Reed Hastings, who is a big charter school supporter and an investor in the Rocketship Education charter school network. At a meeting of the California Charter Schools Association on March 4, he said in a keynote speech that the problem with public schools is that they are governed by elected local school boards. Charter schools have boards that are not elected and, according to his logic, have “a stable governance” and that’s why “they constantly get better every year.”
See, in the private sector there was this phenomenon called "re-organization" (or re-org) for short that seemed to hit every couple of years.  Each time there was a massive shift in governance and lines of reporting.  If Netflix has managed to avoid these "re-orgs" then I see that as a very positive feature of the company, but it is hardly a guarantee that all private corporations will be able to do the same things.

It also leads to other tough questions.  The reason that the private sector works well is "creative destruction" as better companies outcompete poorer companies.  Is the charter school movement going to be immune to competition as well? 

And if they are immune to market forces, what are they accountable to?  If we think the answer is a higher level of government, then why do we think it will be more stable and more accountable than the school boards? 

This is not so much a defense of school boards (which I have seriously mixed feelings about) as it is a question of what model do we replace them with?  I am not sure that the command and control style socialist model of the state owned or supported corporation has been the most efficient alternative, has it? 

EDIT: Mark Palko wanted me to mention that Valarie Strauss has been going good work in this area for some time.  Also note that idea of California needing to "catch up" to New Orleans  -- it is possible for a former backwater to become dynamic (think Macedonia at the end of the classical Greek era) but this is often not the best bet to make.

NOTE: Mark here. For a bit more context, check out the reform movement gadfly Edushyster's take on the charter chain Hastings was promoting,

Monday, March 17, 2014

Texas versus California

I have been trying to decide if Scott Lemieux covered this too completely, but I decided that there were a couple of useful points in this article.  Especially as relates to my California versus Texas discussion with Mark, where we discuss the relative merits of the two states. For example:
And despite all the gloating by Texas boosters about how the state attracts huge numbers of Americans fleeing California socialism, the numbers don’t bear out this narrative either. In 2012, 62,702 people moved from California to Texas, but 43,005 moved from Texas to California, for a net migration of just 19,697.
This really points out how marginal the population shift is.  It isn't zero, but it is also not a mass population shift driven by the hellish California region.

Even more telling:

Oh yes, I know what you’ve heard. And it’s true, as the state’s boosters like to brag, that Texas does not have an income tax. But Texas has sales and property taxes that make its overall burden of taxation on low-wage families much heavier than the national average, while the state also taxes the middle class at rates as high or higher than in California. For instance, non-elderly Californians with family income in the middle 20 percent of the income distribution pay combined state and local taxes amounting to 8.2 percent of their income, according to the Institute on Taxation and Economic Policy; by contrast, their counterparts in Texas pay 8.6 percent.

And unlike in California, middle-class families in Texas don’t get the advantage of having rich people share equally in the cost of providing government services. The top 1 percent in Texas have an effective tax rate of just 3.2 percent. That’s roughly two-fifths the rate that’s borne by the middle class, and just a quarter the rate paid by all those low-wage “takers” at the bottom 20 percent of the family income distribution. This Robin-Hood-in-reverse system gives Texas the fifth-most-regressive tax structure in the nation.  
That leads to some really interesting questions about he relation of tax rates to prosperity.  If most people in Texas pay more taxes than California, then maybe this is another data point on the scale of more money for government leading to a stronger and more prosperous state.  But these points really don't make the case that Texas is clearly better than California.  Now both states have a strong streak of pro-business advocates, and so I think that both could end up as engines of American prosperity.  But I think that the future for California is pretty optimistic once the actual facts are broadly considered. 

Sunday, March 16, 2014

The second half is more remarkable

Paul Krugman:

But my guess is that in a week or two we will once again hear a supposed wise man saying that we need to raise the retirement age to 67 because of higher life expectancy, unaware that (a) life expectancy hasn’t risen much for half of workers (b) we’ve already raised the retirement age to 67.
I completely understand why part a is misleading -- poorer workers get less social security already and making them work longer means fewer years of benefits increasing the benefits to wealthier contributors, who already get more per month. 

But the second part is the piece that I find truly remarkable.  I mean how hard could it be for the media to fact check that raising eligibility to 65 is current law?  Sure, you might want to protect the law but that is a completely different argument for forgetting that the low changed in 1983.  Or you could want to phase it in faster, but that would also be a) a different argument and b) seemingly ill-timed with the changes in the 401(k) system.

So I could imagine some debate about the first part (based around potential short term trends and the fact that we don't have the complete death curve for the population).  But the second is simply . . .  odd. 

Friday, March 14, 2014

This was both entertaining and thought-provoking

It was also a very clear headed explanation of some of the key mythologies of the modern cult of anarcho-capitalism.  I especially liked (edited with *'s for questionable language choices):

But if none of that stuff existed, there would be nothing stopping Jay-Z from taking your farm. In other words, you don't "own" ****. The entire concept of owning anything, be it a hunk of land or a house or a ****ing sandwich, exists purely because other people pay other armed men to protect it. Without society, all of your brave, individual talents and efforts won't buy you a bucket of ****s. So when I say "We're all in this together," I'm not stating a philosophy. I'm stating a fact about the way human life works. No, you never asked for anything to be handed to you. You didn't have to, because billions of humans who lived and died before you had already created a lavish support system where the streets are all but paved with gold. Everyone reading this -- all of us living in a society advanced enough to have Internet access -- was born one inch away from the finish line, plopped here at birth, by other people.


But it is a very straightforward explanation of the concept of interdependence, and the way that we are all connected based on social convention. 

Sometimes the Cracked site is surprisingly thought provoking. 

Orthogonality and the SAT

[Note: 'SAT' refers to the SAT Reasoning Test]

If you spend any time following the SAT debate, you will frequently encounter some variation on the phrase:
All in all, the changes are intended to make SAT scores more accurately mirror the grades a student gets in school.

The thing is, though, there already is something that accurately mirrors the grades a student gets in school. Namely: the grades a student gets in school. A better way of revising the SAT, from what I can see, would be to do away with it once and for all.
Putting aside the questionable assumption that the purpose of a colleges selection process is to find students who will get good grades at that college, there is a major statistical fallacy here, and it reflects a common but very dangerous type of oversimplification.

When people talk about something being the "best predictor" they generally are talking about linear correlation. The linearity itself is problematic here – we are generally not that concerned with distinguishing potential A students from B students while we are very concerned with distinguishing potential C students from potential D and F students – but there's a bigger concern: The very idea of a "best" predictor is inappropriate in this context.

In our intensely and increasingly multivariate world, this idea ("if you have one perfectly good predictor, why do you need another?") is rather bizarre and yet surprisingly common. It has been the basis of arguments that I and countless other corporate statisticians have had with executives over the years. The importance of looking at variables in context is surprisingly difficult to convey.

The explanation goes something like this. If we have a one-variable model, we want to find the predictor variable that gives us the most relevant information about the target variable. Normally this means finding the highest correlation between some transformation of the variable in question and some transformation of the target where the transformation of the target is chosen to highlight the behavior of interest while the transformation of the predictor is chosen to optimize correlation. In our grading example, we might want to change the grading scale from A through F to three bins of A/B, C, and D/F. If we are limited to one predictor in our model picking, the one that optimizes correlation under these conditions makes perfect sense.

Once we decide to add another variable, however, the situation becomes completely different. Now we are concerned with how much information our new variable adds to our existing model. If our new variable is highly correlated with the variable already in the model, it probably won't improve the model significantly. What we would like to see is a new variable that has some relationship with the target but which is, as much as possible, uncorrelated with the variable already in the model.

That's basically what we are talking about when we refer to orthogonality. There's a bit more to it – – we are actually interested in new variables that are uncorrelated with functions of the existing predictor variables – but the bottom line is that when we add a variable to a model, we want it to add information that the variables currently in the model haven't already provided.

Let's talk about this in the context of the SAT. Let's say I wanted to build a model predicting college GPA and, in that model, I have already decided to include high school courses taken and their corresponding grades. Assume that there's an academic achievement test that asks questions about trigonometric identities or who killed whom in Macbeth. The results of this test may have a high correlation with future GPA but they will almost certainly have a high correlation with variables already in the model, thus making this test a questionable candidate for the model. When statisticians talk about orthogonality this is the sort of thing they have in mind.

The SAT works around this problem by asking questions that are more focused on aptitude and reasoning and which rely on basic knowledge not associated with any courses beyond junior high level. Taking calculus and AP English might help students' SAT scores indirectly by providing practice reading and solving problems so we won't get perfect orthogonality but it will certain do better in this regard than a traditional subject matter exam.

This is another of those posts that sits in the intersection of a couple of major threads. The first concerns the SAT and how we use it. The second concerns orthogonality, both in the specific sense described here and in the general sense of adding information to the system, whether through new data, journalism, analysis or arguments. If, as we are constantly told, we're living in an information-based economy, concepts like orthogonality should be a standard feature of the conversation, not just part of statistical esoterica. 

Thursday, March 13, 2014

Negotiation

This is a really interesting story about a failed academic negotiation.  It is pretty clear that nobody has covered themselves in glory here, although the response from the institution seems awfully harsh and a symptom of the sort of extremely tight labor markets that reduce employee choice.  One only hopes that the maternity leave condition was orthogonal to the decision to rescind the offer, although I suspect asking for a one year delay in start date was more likely as the culprit. 

The comments below are quite interesting as well.

More on inequality

As a follow-up to the last post consider this point by Chris Dillow:
Of course, this calculation only makes sense if we assume such redistribution could occur without reducing aggregate incomes. But such an assumption is at least plausible. The idea that massive pay for the 1% has improved economic performance is - to say the least - dubious. For example, in the last 20 years - a time of a rising share for the top 1% - real GDP growth has averaged 2.3% a year. That's indistinguishable from the 2.2% seen in the previous 20 years - a period which encompassed two oil shocks, three recessions, poisonous industrial relations, high inflation and macroeconomic mismanagement - and less than we had in the more egalitarian 50s and 60s.
It is not that there are no adverse consequences to redistribution.  Nor does it mean than any policy, taken to an extreme, will be as effect as it will on the margin when applied to current conditions.  But it is an even more compelling argument that inequality is not, in and of itself, self evidently a force for economic growth without some additional evidence. 

Tuesday, March 11, 2014

Data Intuition

Paul Krugman:
Even more strikingly, however, the level as opposed to the growth rate of French GDP per capita is substantially lower than that of the US.

This is my main concern about Ostry et al. Suppose we think that strong redistributionist policies reduce the level of output — but that it’s a one-time shift, not a permanent depression of growth. Then you could accept their result of a lack of impact on growth while still believing in serious output effects.
I might be able to accept the one time shift theory of redistribution, where reducing inequality lowers the overall GDP of the economy.  But if these effects are dynamic (they change the rate of growth instead of shifting the absolute level) then they should show up in the historical record.  After all, there are a number of highly unequal societies -- have they outcompeted the more equal societies repeatedly? 

Did the French revolution greatly depress French output and dynamism? 

Now it could be that this is one element of a complex system.  That is totally plausible.  But then it should also be a candidate for trade-offs.  But the countries that have done large levels of redistribution (think US versus Canada or Denmark) have not obviously done worse. 

In general, simple explanations for complex phenomenon are always suspect, especially if it is difficult to formulate a test that night falsify the hypothesis

Sunday, March 9, 2014

Open Data

This is a pretty good argument for why there is resistance to completely open data:
When people don’t want to release their data, they don’t care about the data itself. They care about the papers that could result from these data. I don’t care if people have numbers that I collect. What I care about is the notion that these numbers are scientifically useful, and that I wish to get scientific credit for the usefulness of these numbers. Once the data are public, there is scant credit for that work.

It takes plenty of time and effort to generate data. In my case, lots of sweat, and occasionally some venom and blood, is required to generate data. I also spend several weeks per year away from my family, which any parent should relate with. Many of the students who work with me also have made tremendous personal investments into the work as well. Generating data in my lab often comes at great personal expense. Right now, if we publicly archived data that were used in the creation of a new paper, we would not get appropriate credit in a currency of value in the academic marketplace.
I think the key to this argument is that most of the effort in some fields lies in the collection of the data bit all of the credit is based on papers.  So you would end up, rather quickly, with a form of tragedy of the commons where the people who create the data end up with little credit . . . meaning we would end up with less data. 

Are there are alternatives to this paradigm?  Of course.  The US census is a excellent example of an alternative model -- where the data collection and cleaning is done by a government department on the behalf of all sorts of researchers.  Splitting data collection and data analysis in this way is certainly a viable model. 

But pretending that open data is a simple case of people being reluctant to share their information is really an unfair portrayal.  In my own career I have had lots of access to other peoples data and they are extremely generous so long as I offer to give proper credit.  So I don't think the open data movement is all wrong, but it does suggest that there is a difficult conversation to make this work well. 

Wednesday, March 5, 2014

How did we miss this one?

Mike the Biologist links to a remarkable statistic:
There are numerous problems with using VAM scores for high-stakes decisions, but in this particular release of data, the most obvious and perhaps the most egregious one is this: Some 70 percent of the Florida teachers received VAM scores based on test results from students they didn’t teach and/or in subjects they don’t teach
.Even more remarkable, this was only revealed after a court ordered the Florida Times-Union sued for access to the records.  The source also notes that this issue is live in Tennessee, which has similar problems.  Now we have a lot of moving parts in the area of education reform and there are arguments about the use of value added measures (VAM) testing. 

But nobody has a good argument about testing other teachers and making employment decisions based on their performances.  When we talk about peer effects, it is the students in the classroom and not colleagues that we are thinking of.  It is also striking how much room there is to game statistics when you only collect real data on one third of teachers.  Can we really presume that this data collection is a proper random sample? 

These issues are not necessarily small issues.  They have the potential to replace one set of issues in education with another.  Nor is it 100% clear that they address the issue of social mobility, either, as less job security for teachers does not appear to directly address the drivers of intergenerational social mobility

I have respect for people trying to solve a tough problem, but this does not seem to be a great way to go.