Showing posts with label modeling. Show all posts
Showing posts with label modeling. Show all posts

Monday, May 31, 2010

Robert Samuelson would not make a good statistician

Robert Samuelson is taking considerable heat for this column in the Washington Post complaining about the way we measure poverty. Dean Baker and Mark Thoma posted detailed and highly critical responses that listed several problems with Samuelson's argument. Both of them, however, skipped over at least one serious statistical flaw in the column.

Here's the quote from Samuelson:
Second, the poor's material well-being has improved. The official poverty measure obscures this by counting only pre-tax cash income and ignoring other sources of support. These include the earned-income tax credit (a rebate to low-income workers), food stamps, health insurance (Medicaid), and housing and energy subsidies. Spending by poor households from all sources may be double their reported income, reports a study by Nicholas Eberstadt of the American Enterprise Institute. Although many poor live hand-to-mouth, they've participated in rising living standards. In 2005, 91 percent had microwaves, 79 percent air conditioning and 48 percent cellphones.
The fallacy here is closely related to the phenomena of the wrong-way coefficient. You fit a model and you see a statistically significant variable with the wrong sign. For a fairly silly example, you build a model predicting how long it takes travellers to get from New York City to DC and you find that the indicator for being searched by a uniformed officer has a negative coefficient which would suggest that being searched somehow shortens your travel time. The explanation for this counterintuitive result is that there's a relationship between this variable and one or more of the other variables in your model. In this case there's a strong correlation between being searched and flying vs. driving.

For people living in residences with functioning kitchens, good ventilation and a land line, getting a microwave, an air conditioner and a prepaid cellphone clearly represents an increase in well being. If, however, there is an inverse relationship among the poor between having a stove/having a microwave, or ventilation/AC or land line/cell, then the high incidence rates could easily indicate a lower standard of living.

For an example of how not having a stove could make having a microwave more likely, check out this story from NPR:
So many immigrants, homeless people and others of limited means living in single-room occupancies (SROs) have no kitchens, no legal or official place to cook. To get a hot meal, or eat traditional foods from the countries they've left behind, they have to sneak a kind of kitchen into their places. Crock pots, hot plates, microwaves and toaster ovens hidden under the bed. And now, the latest and safest appliance, the appliance that comes in so many colors it looks like a modern piece of furniture: the George Foreman Grill. It is, quite literally, a hidden kitchen.
For me, a George Foreman grill would be a luxury purchase, but not having one doesn't mean I'm worse off than the next guy I see pushing a shopping cart with all of his belongings down the street.

Tuesday, April 27, 2010

Predicting the spread

Have you ever been working on a problem and had that nagging feeling that you're missing an obvious solution? Well, I'm having one of those moments now. I'm working on a project that, though it has nothing to do with sports or betting, is analogous to the following:

You want to build a model predicting the spread for games in a new football league. Because the line-up of teams is still in flux, you decide to use only stats from individual teams as inputs (for example, an indicator variable for when the Ambushers play the Ravagers would not be allowed). In other words, you're using data from individuals to predict a metric that is only defined for pairs.

Assume there are around fifty teams and each team has played all of the others exactly one time.

This feels like stat 101 but I can't recall seeing another problem like it. Anyone out there have any suggestions?