Friday, October 30, 2020

Are Nate and Elliot and Andrew klever enough to kope with a K

Many years ago in Paris, Arkansas, there was a small shop owned by a local family named Kafka. I never met them and have no idea what relation they had to the writer.

The shop was of a very common type in the Ozarks, folksy with more often than not a made-in-Taiwan hillbilly décor. I don’t recall ever going inside but I do remember the sign which read something like this:

Krafts
Antiques
Fun Stuff
Kwilts
Art Supplies

This memory is not all that relevant but then neither is this post. It would have been had I written it when I first intended to a month or two ago. Back then, economic indicators and forecasts probably probably had a bigger roles in the models of 538 and the Economist. I’d imagine now it’s all about weighing polls and estimating turnout. But even if I missed the timeliness window for this post, there are some less ephemeral issues I still want to hit.

On some level, all predictive modeling relies on the assumption that the important relationships and trends we’ve observed in data in the past will continue to hold in the future. We don’t talk about it all that much but this is one of those things that makes all competent statisticians at least a little worried. This is especially true when we go out of the range of our data, when the variables we put into our model start having values we’ve never seen before.

Pretty much serious election models factor in the economy and where it’s going. The actual relationship may be complicated but, at the risk of oversimplifying, an economy that’s good or trending up favors the party in power and vice versa.

But what happens when the economy is good for half the people and terrible for the rest? Many economists have described our current situation as a K-shaped recovery with white-collar knowledge workers doing fairly well while those in other sectors such as the service industry suffering horribly.

As far as I know, we haven’t had a presidential election during a K-shaped recovery, at least not since we starting scientific polling. This is outside the range of data (as is the pandemic, as is having a president openly undermining the election, as is…).  This is where the art of modeling kicks in. The statisticians at 538 are smart and experienced and I have faith in their judgement.

But when you read credulous story about model confidently predicting some wildly counter-intuitive development, it is also good to remember that modeling is a mixture of science and art and some people aren’t very good at the latter.

2 comments:

  1. Mark:

    I think that in your post you are overrating the sophistication of both the Economist's and Fivethirtyeight's forecast. Yes, the forecasts use economic predictors, but just in a crude way. Also, the forecasts only work with aggregate vote by state, nothing on individual voters' characteristics. The models do have state-level correlated error terms whose correlations are governed in part by demographics, but that's nothing like a demographic model of voting.

    Also, our models of turnout are crude---really no modeling to speak of at all, pretty much just relying on polls to get this right on average (with uncertainty captured in that extra error term for nonsampling error).

    And of course we have no model for vote suppression or votes not being counted, which I guess is another form of vote suppression.

    ReplyDelete
    Replies
    1. Andrew,

      First off, I think you may have underestimated just how much this post was an excuse to finally work the Kafka anecdote into a post, but on the larger question...

      I always assumed that your main focus in this project was on the problem of poll aggregation in particular and on the larger question of bringing together disparate sources of information in general. (A tremendously important topic in the age of big data.) The role of economic indicators is small side issue in that context, but it is still a question of interest. This is not the only time we have to deal with what was a simple relationship becoming complex.

      It makes sense that early in an election cycle, the expected state of the economy will provide useful information as to how the polls are likely to behave in the upcoming months. Once again, I am assuming here but historically it would seem that one or two numbers told you all you needed to know for this problem. There was no reason to go down and economic or demographic rabbit hole.

      In 2020, however, we are far out of the range of historical data in so many ways. This probably doesn’t have a great deal of impact on the analytic methods applied to the polls, but it certainly could play hell with traditional informative priors.

      As for the relationship between likely voter models and voter suppression, that’s way too big a topic to open at the bottom of a comment thread.

      Delete