West Coast Stat Views (on Observational Epidemiology and more)

Wednesday, June 9, 2010

Multiple Languages

Andrew Gelamn has a nice post on the advantages of knowing more than one statistical programming language.

It's a good point and, at the risk of beating a dead horse, one that I increasingly have taken to heart. I am actually thinking about exposihng my students to R this fall. It's not an ideal choice because I am a mediocre programmer (at best) and I know SAS way better than R. But there is a real push to have our students at least understand Bayesian statistics and I am simply not a fan of the Bayesian approaches in SAS (at least the last time that I looked).

The other reason for teaching R is that it is open source. A corporate license for SAS appears to cost $7000/year. While cheap compared the analyst, it can happen that students will end up in environments where access to SAS isn't easy to obtain and it is nice to have a back-up option.

On the other hand, we often forget the very nice log files and complete outputs that SAS produces. There are environments where a paper trail is essential and SAS is an ideal tool for those cases.

So we'll see how I think about it after this fall but wish me luck!

Tuesday, June 8, 2010

Papers and Industry

In an interesting post, Exponential Book discusses the potential impact of papers on jobs. Now (s)he appears to be in physics, and things vary by field. But I do know that when I interviewed (and worked) as a statistician in financial services we did not consider papers as part of the hiring package. We were much more focused on job skills such as facility with key software packages (at that time it was SAS -- explaining my over-use of the software a decade later) and communication skills.

Now sure, a paper could be used to show that one was a good scientific writer. But it was a very minor consideration (at best).

Far better to show that you liked to assemble data sets. Being good at pulling your own data and developing data sets seemed to be one of the strongest predictors of success at this particular company (not least because you did not have to compete for programmer resources with other parts of the company).

Anti-stimuli

From Off the Charts:

States and localities cut 22,000 jobs in the past month, wiping out half the month’s gain in private-sector jobs (Matthew Yglesias highlights this issue as well). In total, state and local governments have cut 231,000 jobs, including 100,000 local education jobs, since the summer of 2008.

Years ago, I recall hearing an economist say that one of the reasons that we couldn't have another Great Depression was because a higher percentage of workers were in the public sector and, of course, those job wouldn't go away in an economic downturn.

I really wish I could remember that economist's name.

Monday, June 7, 2010

Seyward Darby does not understand economics

There are so many things wrong with Seyward Darby's New Republic piece and I have so much to do this week (finishing off big posts, lining up some more work, getting the bugs out of a text-mining tool), that I'm going to limit myself to just one for now.

The story here is not which teachers get laid off; the story is the utter insanity of mass firings by the government during the worst economic meltdown of the Postwar Era. As previously shown, this negates all of our stimulus efforts and comes disturbingly close to replicating Herbert Hoover's response to the Great Depression.

Even if Darby's arguments were sound (and they're not), the article would still be little more than a distraction. We find ourselves in a burning building; Darby wants to stop and talk about radon levels.

Saturday, June 5, 2010

Innovationary spiral

(I suspect economists have a nice, concise term for all of this. Perhaps one of our better informed readers could supply it)

The scary thing about deflation is the way it takes stalled economies and slows them down even further by encouraging people to wait to make purchases under the assumption that prices will go down further.

Though we don't often discuss it in these terms, technological innovation can create something like a deflationary spiral. Technological advances tend to make things cheaper and people often put off purchases assuming that prices will go lower, particularly with products like personal electronics.

This deflationary effect can cause serious problems. Most technological advances come burdened with steep development costs and depend on economies of scale to manage a competitive price. The situation is even worse for technologies that are dependent on a large network of other users (i.e. telephones) or a large number of outside vendors (i.e. DVD players).

The saviour of many new technologies is that most benevolent of creatures, the early adopter. (Q: Who buys the first telephone?; A: Someone who wants to say he had the first telephone.) Early adopters buy the new technologies while they are still overpriced and often useless. The rest of us reap the benefits.

There is a more dangerous form of the innovationary spiral that shows up when the the technology has an unpleasant cost or consequence. Under these circumstance, the spiral can be the mother of all excuses for procrastination. Serious problems can be allowed to fester for years even though practical solutions are available because a cheaper, less painful solution may emerge in the future.

Consider obesity. We have seen and continue to see significant advances in the field -- it's fair to say that for the vast majority of people this is a treatable condition -- but all of the treatments (including bariatric surgery) use some combination or exercise and portion control. Hundreds of thousands if not millions of people put off doing anything about a life-threatening condition in part out of the belief that something around the corner will allow them to lose weight without limiting their consumption or increasing their activity level.

Possibly worse yet is the way the promise of new technology is often held up as a reason not to take action on climate change despite the fact that:

1. We already have more than enough mature, cost-effective technology to cut carbon emission and its effects beneath any of the proposed goals. Dozens of solutions ranging from plug-in hybrids, ground-source heating and painting black roofs white to building nuclear plants. Even if you take a handful of the most controversial items off of the table, we still have more than enough left to solve the problem.

2. (and here's the real kicker, folks) The implementation costs/consequences of many of the just-around-the-corner technologies are actually greater than those that come with what we already have sitting on the shelf. Consider hydrogen fuel cells. Contrary to popular opinion, hydrogen is not particularly dangerous to work with. It is, however, a bitch to handle. Forget turning over the fleet. The time and expense required to set up just the infrastructure to produce, transport, store and transfer the hydrogen would be enough to make us energy independent using nothing but the technology on hand.

Don't get me wrong. I'm a huge fan of research, but when it's being used as an excuse not to take action, it's not such a good investment.

Pick a number, any number -- stimulus edition

It may be the most fundamental question in in statistics: what number (or set of numbers) do we use to measure some property. It is usually the first thing we have to ask ourselves and we often struggle with the question.

One of the simplest examples of this is "do we use net or gross?" It's hard to imagine a more obvious question and yet I have seen cases in the business world that used gross when net was called for and the results were disastrous.

Today's related case comes from this worthwhile post by Stephen Gordon who argues that the number we generally use to discuss stimulus isn't just wrong; it doesn't even get the sign right.

Stimulus? What stimulus?
Robert Reich on the risks of a 'double-dip' recession in the US:
The only reason the economy isn’t in a double-dip recession already is because of three temporary boosts: the federal stimulus (of which 75 percent has been spent), near-zero interest rates (which can’t continue much longer without igniting speculative bubbles), and replacements (consumers have had to replace worn-out cars and appliances, and businesses had to replace worn-down inventories).
Emphasis added.
There has been much talk of the size of the US federal stimulus, and much debate about whether or not it has been an effective counter-cyclical policy instrument.
But it's important to remember that the proper measure for fiscal stimulus is not spending by the federal government; it is spending by all levels of government. And when you look at the contributions to US GDP growth (Table 1.1.2 at the BEA site), total government spending has been a drag on growth over the past two quarters. The increases at the federal level have not been enough to compensate for the spending cuts at the local and state levels.

Friday, June 4, 2010

How is war like comedy?

In both, if you screw up you generally find out really quickly. Education, on the other hand, is one of those fields where you can make a string of terrible decisions that can go for years, even decades without anyone noticing (and longer still before people care). For that reason alone, we can probably dismiss David Warsh's suggestion that education might become Obama's Vietnam.

That's the only thing in the post you can dismiss. The rest of it is sharp, on-target and pretty much essential reading if you're following the education debate.

Thursday, June 3, 2010

"It's all about re-branding in this economy"

Really not that damned funny

Whenever earmark season comes along opportunistic politicians and their hand-puppet journalists have a grand old time making jokes about the silly things these trivial amounts of money go to. The ones that get the biggest laughs are agricultural earmarks. Here are some comedy stylings of McCain and Dowd. Over half of the earmarks they have fun with involve agriculture and land management.

Today's New York Times has a reminder of what the cost of blights and pests can be:

Lynet Nalugo dug a cassava tuber out of her field and sliced it open.

Inside its tan skin, the white flesh was riddled with necrotic brown lumps, as obviously diseased as any tuberculosis lung or cancerous breast.

“Even the pigs refuse this,” she said.

The plant was what she called a “2961,” meaning it was Variant No. 2961, the only local strain bred to resist cassava mosaic virus, a disease that caused a major African famine in the 1920s.

But this was not mosaic disease, which only stunts the plants. Her field had been attacked by a new and more damaging virus named brown streak, for the marks it leaves on stems.

That newcomer, brown streak, is now ravaging cassava crops in a great swath around Lake Victoria, threatening millions of East Africans who grow the tuber as their staple food.

Although it has been seen on coastal farms for 70 years, a mutant version emerged in Africa’s interior in 2004, “and there has been explosive, pandemic-style spread since then,” said Claude M. Fauquet, director of cassava research at the Donald Danforth Plant Science Center in St. Louis. “The speed is just unprecedented, and the farmers are really desperate.”

Two years ago, the Bill and Melinda Gates Foundation convened cassava experts and realized that brown streak “was alarming quite a few people,” said Lawrence Kent, an agriculture program officer at the foundation. It has given $27 million in grants to aid agencies and plant scientists fighting the disease.

The threat could become global. After rice and wheat, cassava is the world’s third-largest source of calories. Under many names, including manioc, tapioca and yuca, it is eaten by 800 million people in Africa, South America and Asia.

Maybe it's just me but I really don't see what's so funny about agricultural research. Perhaps Maureen can explain it to me.

Wednesday, June 2, 2010

"The upside of mortgage default"

I normally distrust this kind of wildly counter-intuitive statement, but then again it normally isn't presented this convincingly and it normally doesn't come from someone as trustworthy as Felix Salmon.

"Poll: Only Campbell Can Beat Boxer"

From David Frum's site:

The poll by the Los Angeles Times and University of Southern California clearly shows that U.S. Sen. Barbara Boxer is unpopular with voters, but in a theoretical matchup, only one of the three GOP primary candidates can beat her: former Silicon Valley congressman Tom Campbell.
But the poll also indicates that Campbell might not have the chance to face Boxer: Former Hewlett-Packard CEO Carly Fiorina has a huge lead in the three-way primary contest to decide which Republican will run against Boxer in the fall.

As mentioned before, the GOP primary process is broken.

Resisting the urge to make a bad pun here

Olivia Judson discusses cuckoos and other brood parasites in today's thought provoking column.

Perils of Cross-sectional Studies

I am reading Jean Chatzky's book "The Difference"; in in she attempts to look at what traits are associated with good economic outcomes. Some of her examples are very good and a lot of it is thought provoking. But there were a few cases where she seems to run into trouble.

For example, risk taking behavior is u-shaped (highest for the wealthy and the permanently indebted). I can't tell if this relationship in her data is statistically significant (as the variance is not presented) but that doesn't get at the main point, anyway.

The main point is that you would expect people who take risks to break into the highly successful and the impoverished. Looking at the final outcome and saying "what is the expected value of taking risks" is more informative than noting risk takers have more money. If somebody offered to double your wealth if you could roll a 1 or a 2 on a 6 sided die, would this be a good idea? Yet, if we offered this choice to a room of people and then ranked them by net worth, it's likely that the wealthiest people would have rolled the die. So would the least wealthy, as well.

What if you doubled your income on a 1 to 4; the expected value of the roll is positive but losing everything you own might be worse than doubling what you currently have. It's a very complicated inference and it probably requires a posterior distribution to properly express what the choices look like.

Now, I suspect that prospective data would support intelligent risk taking and the book has a lot of good data in it (so don;t take this as a slam of the book as a whole; actually gathering and interpreting data adds a lot to the conversation even if the interpretation isn't always trivial). But it does highlight the complexity of drawing any inferences from cross-sectional retrospective studies. It's not just an issue with Epidemiology data but can occur anywhere else.

[note: some typos were corrected after the initial posting]

Tuesday, June 1, 2010

Journal Selection Strategy

FemaleScienceProfessor, who always blogs about cool things, has a question about journal choice. Namely, what do you with an article that could be published in a major journal but might not be?

Is this worse if you are an early career scientist who needs to get their work out their to establish productivity?

I have actually had this happen where a paper got mostly positive reviews, a major revision and then an ultimate rejection. The process took a very long time and the final dismissal was a single sentance. It's was a nasty enough experience that it actually makes me reluctant to return to that journal again.

Would I do it again? Maybe . . . After that, I dramatically undershot the next choice of journal for a potentially controversial paper. This was also a major mistake. The hardest cases are alwasy going to be the borderline ones. Long review times and ambiguous options to resubmit are always a bad outcome, no matter how I look at it.

But I wish I had a better feel for what the risk/benefit trade-off really was . . .

Monday, May 31, 2010

Robert Samuelson would not make a good statistician

Robert Samuelson is taking considerable heat for this column in the Washington Post complaining about the way we measure poverty. Dean Baker and Mark Thoma posted detailed and highly critical responses that listed several problems with Samuelson's argument. Both of them, however, skipped over at least one serious statistical flaw in the column.

Here's the quote from Samuelson:

Second, the poor's material well-being has improved. The official poverty measure obscures this by counting only pre-tax cash income and ignoring other sources of support. These include the earned-income tax credit (a rebate to low-income workers), food stamps, health insurance (Medicaid), and housing and energy subsidies. Spending by poor households from all sources may be double their reported income, reports a study by Nicholas Eberstadt of the American Enterprise Institute. Although many poor live hand-to-mouth, they've participated in rising living standards. In 2005, 91 percent had microwaves, 79 percent air conditioning and 48 percent cellphones.

The fallacy here is closely related to the phenomena of the wrong-way coefficient. You fit a model and you see a statistically significant variable with the wrong sign. For a fairly silly example, you build a model predicting how long it takes travellers to get from New York City to DC and you find that the indicator for being searched by a uniformed officer has a negative coefficient which would suggest that being searched somehow shortens your travel time. The explanation for this counterintuitive result is that there's a relationship between this variable and one or more of the other variables in your model. In this case there's a strong correlation between being searched and flying vs. driving.

For people living in residences with functioning kitchens, good ventilation and a land line, getting a microwave, an air conditioner and a prepaid cellphone clearly represents an increase in well being. If, however, there is an inverse relationship among the poor between having a stove/having a microwave, or ventilation/AC or land line/cell, then the high incidence rates could easily indicate a lower standard of living.

For an example of how not having a stove could make having a microwave more likely, check out this story from NPR:

So many immigrants, homeless people and others of limited means living in single-room occupancies (SROs) have no kitchens, no legal or official place to cook. To get a hot meal, or eat traditional foods from the countries they've left behind, they have to sneak a kind of kitchen into their places. Crock pots, hot plates, microwaves and toaster ovens hidden under the bed. And now, the latest and safest appliance, the appliance that comes in so many colors it looks like a modern piece of furniture: the George Foreman Grill. It is, quite literally, a hidden kitchen.

For me, a George Foreman grill would be a luxury purchase, but not having one doesn't mean I'm worse off than the next guy I see pushing a shopping cart with all of his belongings down the street.