Thursday, March 14, 2013

The Rise of P-Value

In the middle of a recent conversation prompted by this post by Andrew Gelman, I struck me that I couldn't recall encountering the term p-value before I started studying statistics in the Nineties. These days you frequently run across the term in places like the NYT article Gelman cited or this piece in the Motley Fool; were they always there and I just missed them?

Fortunately we have Google's Ngram viewer to resolve these questions and apparently the answer is a qualified yes. While people were talking about p-values at the beginning of the decade, more people were talking about them by the end.

The question now is how much of that growth is attributable to general interest writing like the NYT.



Wednesday, March 13, 2013

Epidemiology and Truth

This post by Thomas Lumley of Stats Chat is well worth reading and thinking carefully about.  In particular, when talking about a study of process meats and mortality he opines:

So, the claims in the results section are about observed differences in a particular data set, and presumably are true. The claim in the conclusion is that this ‘supports’ ‘an association’. If you interpret the conclusion as claiming there is definitive evidence of an effect of processed meat, you’re looking at the sort of claim that is claimed to be 90% wrong. Epidemiologists don’t interpret their literature this way, and since they are the audience they write for, their interpretation of what they mean should at least be considered seriously.


I think that support of an association has to be the most misunderstood piece of Epidemiology (and we epidemiologists are not innocent of this mistake ourselves).  The real issue is that cause is a very tricky animal.  It can be the case that complex disease states have a multitude of "causes".

Consider a very simple (and utterly  artificial) example.  Let assume (no real science went into this example) that hypertension (high systolic blood pressure) occurs when multiple exposures over-whelms a person's ability to compensate for the insult.  So if you have only one exposure off of the list then you are totally fine.  If you have 2 or more then you see elevated blood pressure.  Let's make the list simple: excessive salt intake, sedentary behavior, a high stress work environment, cigarette smoking, and obesity.  Now some of these factors may be correlated, which is its own special problem.

But imagine how hard this would be to disentangle, using either epidemiological methods or personal experimentation.  Imagine two people who work in a high stress job, one of which eats a lot of salt.  They both start a fitness program due to borderline hypertension.  One person sees the disease state vanish whereas the other sees little to no change.  How do you know what was the important factor?

It's easy to look at differences in the exercise program; if you torture the data enough it will confess.  At a population level, you would expect completely different results depending on how many of these factors the underlying population had.  You'd expect, in the long run, to come to some sort of conclusion but it is unlikely that you'd ever stumble across this underlying model using associational techniques. 

The argument continues:
So, how good is the evidence that 90% of epidemiology results interpreted this way are false? It depends. The argument is that most hypotheses about effects are wrong, and that the standard for associations used in epidemiology is not a terribly strong filter, so that most hypotheses that survive the filter are still wrong. That’s reasonably as far as it goes. It does depend on taking studies in isolation. In this example there are both previous epidemiological studies and biochemical evidence to suggest that fat, salt, smoke, and nitrates from meat curing might all be harmful. In other papers the background evidence can vary from strongly in favor to strongly against, and this needs to be taken into account.
 
This points out (correctly) the troubles in just determining an association between A and B.  It's ignoring all of the terrible possibilities -- like A is a marker for something else and not the cause at all.  Even a randomized trial will only tell you that A reduces B as an average causal effect in the source population under study.  It will not tell you why A reduced B.   We can make educated guesses, but we can also be quite wrong.

Finally, there is the whole question of estimation.  If we mean falsehood to be that the size of the average causal effect of intervention A on outcome B is completely unbiased then I submit that 90% is a very conservative estimate (given if you make truth an interval around the point estimate to the precision of the reported estimate given the oddly high number of decimal places people like to quote for fuzzy estimates). 

But that last point kind of falls into the "true but trivial" category . . .


Tuesday, March 12, 2013

Landscapes in everything

SLIGHTLY UPDATED

One of the issues I have with economics exceptionalism is the word 'everything,' as in "markets in everything" or "the hidden side of everything." Not that there's anything wrong with applying economic concepts to a wide variety of questions (I do it myself), but at some point they become overused and start crowding out ideas that are better in a given context.

Think about all the times you heard phrases like the 'marriage market' often followed by the implicit or explicit suggestion that the tools of economics hold the key to understanding all sorts of human behavior even in cases where the underlying assumptions of those tools probably don't apply. Now, for example, compare that to the number of times you've recently heard someone describe something as a fitness landscape when they weren't talking about evolution or physics (OK, that's not the term physicists generally use but the concept is basically the same).

Landscapes are a powerful and widely applicable concept, arguably more so than markets (they are also a long-time fascination of mine). Ideas like gradient searches, perturbation, annealing and, most of all, local optimization are tremendously useful, both to explain complex problems and to suggest approaches for solving them. Once you start thinking in those terms you can see landscapes about as often as Tyler Cowen sees markets.

You can even find researchers coming up with the kind of unexpected, everyday examples that you might expect in a Steven Levitt column.

My favorite recent example (at least recent to me) is T. Grandon Gill's observation that recipes in a cookbook are essentially the coordinates of local optima on a culinary fitness landscape where the amount of each ingredient are the dimensions and taste is the fitness function (technically we should add some dimensions for preparation and make some allowance for the subjectivity of taste, but I'm keeping things simple).

This is a great example of a rugged landscape that everyone can relate to. You can find any number of delicious recipes made with the same half dozen or so ingredients. As you start deviating from one recipe (moving away from a local optima), the results tend to get worse initially, even if you're moving toward a better recipe.

Approaching something as a rugged landscape can provide powerful insights and very useful tools, which leads to another concern about economic exceptionalism -- economics as a field tends to make little use of these models and many economists routinely make modeling assumptions that simply make no sense if the surface being modeled really is rugged.

I asked Noah Smith* about this and as part of his reply he explained:
But for analyzing the equilibrium state of the economy - prices and quantities - economists tend to try as hard as they can to exclude multiple equilibria. Often this involves inventing arbitrary equilibrium criteria with zero theoretical justification. This is done routinely in micro (game theory) as well as in macro. An alternative procedure, commonly used in macro by DSGE practitioners, is to linearize all their equations, thus assuring "uniqueness". Some researchers are averse to this practice, and they go ahead and publish models that have multiple equilibria; however, there is a strong publication bias against models that have multiple equilibria, so many economists are afraid to do this. An exception is that some models with two equilibria (a "good" equilibrium and a "bad" or "trap" equilibrium) do get published and respected. Models with a bunch of equlibria, or where the economy is unstable and tends to shift between equilibria on its own at a high frequency, are pretty frowned upon.
This doesn't mean that economists can't work with these concepts, but it does mean that as economists increasingly dominate the social sciences, approaches that don't fit with the culture and preferred techniques of economics are likely to be underused.

And some of those techniques are damned useful.

* now with source.

Monday, March 11, 2013

Some epidemiology for a change

John Cook has an interesting point:
When you reject a data point as an outlier, you’re saying that the point is unlikely to occur again, despite the fact that you’ve already seen it. This puts you in the curious position of believing that some values you have not seen are more likely than one of the values you have in fact seen.
 
This is especially problematic in the case of rare but important outcomes and it can be very hard to decide what to do in these cases.  Imagine a randomized controlled trial for the effectiveness of a new medication for a rare disease (maybe something memory improvement in older adults).  One of the treated participants experiences sudden cardiac death whereas nobody in the placebo group does. 

One one hand, if the sudden cardiac death had occured in the placebo group, we would be extremely reluctant to advance this as evidence that the medication in question prevents death.  On the other hand, rare but serious drug adverse events both exist and can do a great deal of damage.  The true but trivial answer is "get more data points".  Obviously, if this is a feasible option it should be pursued. 

But these questions get really tricky when there is simply a dearth of data.  Under these circumstances, I do not think that any statistical approach (frequentist, Bayesian or other) is going to give consistently useful answers, as we don't know if the outlier is a mistake (a recording error, for example) or if it is the most important feature of the data.

It's not a fun problem. 

More weekend work avoidance -- the pleasures of microbudgets

Watched the first and second arcs of a fairly obscure British science fiction show from 1979 called Sapphire and Steel. It was apparently intended as a low-budget answer to Doctor Who (which those familiar can attest was not exactly the Avatar of the Seventies).  The result was a sci-fi/fantasy/horror show that had to be shot on standing sets with small casts and very limited special effects.

The result is some really impressive constrained problem solving by writer P.J. Hammond (with considerable assistance from directors David Foster, Shaun O'Riordan and the show's solid leads, David McCallum and Joanna Lumley, the only expensive aspects of the production). Hammond did sometimes lapse into dramatic Calvinball, obviously making up new rules now and then to get himself out of narrative corners, but those bits are easy to overlook, particularly when watching the ways he found to work around the rules he was handed by the producers.

In lieu of optical effects and creature make-up, you get a spot of light on the floor, a shadow on the wall, an ordinary thing in a place it shouldn't be. In an ironic way, the show would almost certainly look cheaper now if they had spent the extra money on those late Seventies effects then. In a sense, they didn't have enough money to be cheesy (except perhaps in the opening title).

There's a bigger point to be made about the costly vs. the clever but the weekend is almost up and my work is going to be unavoidable in a few hours.

Sunday, March 10, 2013

Weekend gaming -- new entries at You Do the Math

I've got three big ongoing threads planned for my teacher support blog, one on the SAT and one on a special class of manipulatives, and one on teaching programming, so naturally I've been avoiding those topics and writing about games instead: If you also have an interest in games and work to avoid, you might drop by and check out:

The Exact Chaos Game -- fleshing out a suggestion by John D. Cook, this lets players bet on iterations of a surprisingly unpredictable function.

Kriegspiel and Dark Chess -- more Wikipedia than me but worth checking out if you'd like to see what chess might look like as a game of imperfect information.

Facade Chess -- along the same lines, here's an "original" imperfect-information variant where a subset of the  pieces may be disguised as other pieces.

Saturday, March 9, 2013

Do op-ed writers provide their own hyperlinks?

Or is some intern handed the copy and told to find some appropriate citations? I generally assume that the links are an intrinsic part of anything written specifically for online consumption but what about the online version of something primarily intended for print?

Take this op-ed by Joe Scarborough and Jeffrey D. Sachs writing for the Washington Post which starts with the following paragraph:
Dick Cheney and Paul Krugman have declared from opposite sides of the ideological divide that deficits don’t matter, but they simply have it wrong. Reasonable liberals and conservatives can disagree on what role the federal government should play yet still believe that government should resume paying its way.
As a commenter on Krugman's blog pointed out, if you click on Krugman's name in that paragraph, you'll end up at a post that starts as follows:
Right now, deficits don’t matter — a point borne out by all the evidence. But there’s a school of thought — the modern monetary theory people — who say that deficits never matter, as long as you have your own currency.

I wish I could agree with that view — and it’s not a fight I especially want, since the clear and present policy danger is from the deficit peacocks of the right. But for the record, it’s just not right.
In other words, to support the claim that Krugman said deficits don't matter, Scarborough and Sachs point to Krugman saying explicitly that people who say deficits don't matter are wrong. Krugman then spends pretty much the entire post arguing that deficits will matter a great deal once we're out of the liquidity trap. Here's the key section.
So we’re talking about a monetary base that rises 12 percent a month, or about 400 percent a year.

Does this mean 400 percent inflation? No, it means more — because people would find ways to avoid holding green pieces of paper, raising prices still further.

I could go on, but you get the point: once we’re no longer in a liquidity trap, running large deficits without access to bond markets is a recipe for very high inflation, perhaps even hyperinflation. And no amount of talk about actual financial flows, about who buys what from whom, can make that point disappear: if you’re going to finance deficits by creating monetary base, someone has to be persuaded to hold the additional base.
This isn't to say that this post is in agreement with the op-ed; in terms of immediate action they are taking completely opposite positions, It would have easy to spell out the distinction, but instead Scarborough and Sachs simply make a claim then point us to something that directly contradicts it.

The strange thing here is that you could find any number of posts where Krugman focuses on the case for stimulus and largely or entirely ignores the dangers of deficits. Any of these would have supported Scarborough and Sachs' thesis. Instead, though, the authors pick possibly the strongest anti-deficit argument Krugman has made in the past five years.

I can understand Scarborough. He is, and I don't mean this as a pejorative, a TV personality. That's a rare and valuable talent and Scarborough is very good at it. It is not, however, a profession that depends upon reputation in the conventional sense. As long as a TV personality does nothing to betray his public persona, almost all press is good press.

For Sachs, though, reputation is extraordinarily important. This is an important and influential scholar, someone whose ideas carry great weight with policy makers. Here's a representative passage from Wikipedia:
Sachs is the Quetelet Professor of Sustainable Development at Columbia's School of International and Public Affairs and a Professor of Health Policy and Management at Columbia's School of Public Health. He is Special Adviser to United Nations Secretary-General Ban Ki-Moon on the Millennium Development Goals, having held the same position under former UN Secretary-General Kofi Annan. He is co-founder and Chief Strategist of Millennium Promise Alliance, a nonprofit organization dedicated to ending extreme poverty and hunger. From 2002 to 2006, he was Director of the United Nations Millennium Project's work on the Millennium Development Goals, eight internationally sanctioned objectives to reduce extreme poverty, hunger, and disease by the year 2015. Since 2010 he has also served as a Commissioner for the Broadband Commission for Digital Development, which leverages broadband technologies as a key enabler for social and economic development.
Silly, avoidable errors undercut Sachs' ability to continue this good work.

Which leads back to my original question. Did Jeffrey Sachs actually agree upon a link that contradicted the point he was trying to make or are links, like headlines and blurbs, often added after a piece is submitted?

Thursday, March 7, 2013

More on Marissa Mayer

I think that this is a very good point:
It also seems like a feminist mistake to expect women entrepreneurs to create little utopias instead of running extremely successful businesses. Mayer was attacked recently for her decision not to allow employees to work at home. She is a woman, this line of thinking goes, how could she think women should have to work away outside of their houses, away from their children? But why should Marissa Mayer have some special responsibility to nurture her employees with a cozy, consummately flexible work environment just because she is a woman? Isn’t her responsibility to run a company according to her individual vision? If we want powerful female entrepreneurs shouldn’t we allow them to pursue entrepreneurial power?
 
 I am not actually 100% sure that the decision to end "work at home" really hurt woman at Yahoo! (as a class, clearly individual workers of both genders could have had their work lives disrupted) given that men are more likely to work at home than women.  Mayer's previous company (Google) tries to limit the number of telecommuters and it is hardly unreasonable that a new CEO would want to draw on successful business models that she has personal experience with. 

Now could this policy change have been done more artfully? Sure.  But I am amazed by the duration of this discussion in the media and how much insight it is bringing into the whole work at home phenomenon. 

Admittedly, it is a competitive field

Thomas Lumley is an early contender for identifying the worst chart of 2013.  This special breed of awful is accomplished by creating a chart that actually takes more effort to process than text describing the differences would.  Since the point of charts is to convey information efficiently, there really is no good reason for this chart to exist. 

Of course, as a long time SAS programmer I am biased against graphical displays of data in general (you would be too if you had to use gplot and gchart).  But I think that this example will be disliked by the R and STATA crowd too. 

Wednesday, March 6, 2013

Forwarded almost without comment

This story from Reuters is outside of my area of expertise so I'm just going to make this blanket recommendation. This is a solid piece of reporting on the not easy-to-cover fields of epidemiology, biostatistics and the economics of health care.

Special Report: Behind a cancer-treatment firm's rosy survival claims

Edit (Joseph): Andrew Gelman correctly points out that the authors are Sharon Begley and Robin Respaut.  This report is useful to me as another reason that we need to have a control arm for randomized trials.  It isn't enough to know what the rate is for conventional care and contrast a novel therapy with it.  You need to also account for the selection effects among the population receiving the novel therapy.  Randomization is a very nice way to accomplish this outcome in a generally understood manner. 

Tuesday, March 5, 2013

Educational dilemmas

This is entirely correct:
You can hold us accountable for how much our graduates learn.  You can hold us accountable for how many students graduate. You can even hold us accountable for both of those at the same time.  And, amazingly enough, you can hold us accountable for doing this while educating a broad spectrum of public high school grads. What you cannot do is hold us accountable for all of those things AND the cost/time required for them to graduate.  Getting lots of people through in a short time frame, and teaching them a lot along the way, requires a lot of attention and a lot of support (whether financial aid so they can focus on school rather than work, or tutoring and small classes and all that, or even extracurriculars to help them develop certain “soft skills”), and that costs money.  So pick any two: Quality, quantity, and cost (which is directly related to time).  If you say that students are learning less and less, believe me, you’re right.  Just don’t tell me that you want me to fix that AND graduate more students without some major changes to How Things Are Done.

I think the same principle applies to high school education.  Due to the modern phobia of taxes, people do not want to pay more for education.  Yet there is a constant pressure for students to lean more and for education to inclusive/accessible.  I am all for finding ways to be more efficient and evidence based in educational spending.  But it doesn't help if the initial conditions are impossible to meet.

Monday, March 4, 2013

Things that tempt you to write posts you don't have time for

Ludwig von Mises:
The age in which the radical anticapitalistic movement acquired seemingly irresistible power brought about a new literary genre, the detective story. The same generation of Englishmen whose votes swept the Labour Party into office were enraptured by such authors as Edgar Wallace. One of the outstanding British socialist authors, G. D. H. Cole, is no less remarkable as an author of detective stories.

The Passing Tramp admirably handles the rebuttal, complete with a reference to that sterling Tory, Lord Wimsey.

Sunday, March 3, 2013

Another solution

Paul Krugman has a good point:
Still, isn’t it bizarre that governors who protest bitterly about the cost of Obamacare, and in general about wasting taxpayers’ money, are willing to throw away lots of money via corporate welfare? Actually, no; it’s only puzzling if you think they believe anything they say.
The context is the decision to allow Arkansas to expand the Affordable Care Act exchanges instead of Medicaid.  Aaron Carroll:
Many claimed that the ACA cost too much. They said it would raise the deficit. They opposed the expansion not only because it raised the federal price tag, but also because it was “fiscally unsustainable” for states in the long run. I took them at their word.

I’m now surprised that they prefer a solution that costs more.
Maybe what we really need to do is randomize?  After all, the secondary benefits of high health care spending are definitely unclear.  The United States isn't exceptional relative to Canada or France in terms of medical outcomes.  The benefits to medical innovation could be attempted via a stronger NIH with a broader mandate.  So the only real question is whether private insurance can result in innovations that reduce overall costs and/or improve outcomes.

Surely randomization of states could provide some really useful information and solve this question more directly?  After all, don't we think of it as the gold standard for a causal inference?  And it would be easy to randomize states to several possible versions of the ACA (no expansion, exchanges, Medicaid expansion, public option to allow the uninsured to purchase Medicaid). 

Is there a good reason not to do this?

P.S. Here is a good example of randomization giving us information in an area that is equally difficult for inference.

Saturday, March 2, 2013

Extremes in student feedback

After I'd been teaching at a school in Watts for a while, I learned that my predecessor had once been overpowered and left tied to his chair by an angry class. This struck me as notable because

A. Despite its location, this was not a rough school

B. My predecessor had been tied to a chair.

When I pressed other faculty members for more details, they explained (rather nonchalantly for my taste), "he was a really bad teacher."

Obviously, most jobs are easier and more pleasant if you're good at them, but this is particularly true in education. Teachers face constant, immediate and often intense feedback from students, something that is greatly intensified when you go to disadvantaged schools in the inner-city or poor rural areas like the Mississippi Delta (where I also taught).

Students get angry at bad instruction and they take advantage of bad classroom management. When you add the amplification that comes with the complex social dynamics of kids and adolescents, teaching can be a truly miserable job if you can't get the hang of doing it right.

This is a large part of the reason why so many new teachers leave the profession. Even after having invested years of study and tens of thousands of dollars, they walk away with a degree that's good for little else because, for them, the job actually is that terrible. By contrast, for those who are good at it, who can explain ideas clearly and establish a rapport with kids and keep a class focused and on task, teaching can be a most enjoyable and satisfying job.

You don't have to be a statistician to see the potential selection effect here. It should certainly be addressed when discussing the impact of bad teachers or proposing incentive pay/dismissal plans for improving education.

It should be addressed but it usually isn't.

Friday, March 1, 2013

Like judging an archery contest after they took down the target...

This is a really small part part of a bigger story (and probably the subject of more posts) but having been on both sides of video evaluations I had a personal problem with this statement from Thomas Kane (which comes to us via Andrew Gelman),
While the mean score was higher on the days that the teachers chose to submit, once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1.
Just to be clear, I don't have any problem with this kind of evaluation and I really like Kane's point about using 360s for teachers, but the claim of perfect correlation has raised a red flag for almost every statistically literate person who saw it. You can see an excellent discussion of this at Gelman's site, both in the original post and in the comments. All the points made there are valid but based on my experience I have one more stick for the fire.

For the sake of argument, let's assume that the extraordinary idea that rank is preserved, that the nth teacher on his or her best day is still worse than the (n+1)th teacher on his or her worst day, is true. For anything more than a trivially small n that would suggest an amazing lack of variability in the quality of lessons from teachers across the spectrum (particularly strange since we would expect weaker and less experienced teachers to be more variable).

But there's a source of noise no one's mentioned and in this case it's actually a good thing.

Except for special cases, teachers walk through the door with a great deal of information about their classes; they've graded tests and homework papers; they've seen the reaction to previous lessons, they've talked with students one-on-one. You would expect (and hope) that these teachers would use that information to adjust their approach on a day to day basis.

The trouble is that if you're evaluating teachers based on an observation (particularly a video observation), you don't have any of that information. You can't say how appropriate a given pace or level of explanation is for that class that day. You can only rely on general guidelines.

Which is not to say that good evaluators can't form a valuable assessment based on a video of a lesson. I'm a big believer in these tools both for staff development and (within reason) evaluation, but it's a inexact and often subjective process. You can get a good picture and diagnose big problems but you will never get the resolution that Kane claimed.

There are other problems with this interview, but the correlation of one should have been an easy catch for the reporter. You should never let an interview subject go unchallenged when claiming perfect results.