Thursday, October 29, 2020

Rethinking modeling assumptions in 2020

 

 

The idea that you should incorporate a correlation matrix into an electoral model is one of those things that just makes sense across the board. It is supported by the data, it is intuitively obvious, and it is easy to justify from first principles.

But here’s the part that bothers me just a little bit. Historically, these models predicted the outcome of a collection of events that happened in different states but mostly at a single point in time simultaneously, election day.

That’s not how elections work in 2020. Different states now have wildly different cadences and rules. Most of the votes might be cast in one state before another even starts the process. Is the likelihood of a candidate outperforming the polls in the first two weeks of October in one state still strongly correlated to the probability in another state on election day? Do certain aspects of the model fare better than others under these new conditions? Would 538’s model handle these changes differently than the Economist model would?

I have absolutely no idea whether or not these are important issues. I am woefully ignorant on the subject, but it seems like an interesting topic for discussion so if anyone better informed than I (which is to say pretty much anyone reading this blog) would care to join in, I would love to hear some opinions.

4 comments:

  1. Mark:

    Lots of the correlation refers to correlations between states in polling errors, not to correlations between states in vote swings.

    ReplyDelete
    Replies
    1. Andrew,

      Are we talking about polling errors in the sense that they misrepresent the true distributions (due to unrepresentative call lists, selection effects, dishonest responses) or is it more on the likely voter/turn-out level?

      I'd imagine early voting would have minimal impact on the first. Not so sure about the second.

      Delete
  2. Have a few thoughts here:
    1. I notice that pollsters have reacted to this by asking people whether they have already voted. At least I have seen media reporting Trump/Biden support conditioned on whether they voted. So that should help with modeling.
    2. The basic model I don't think needs to incorporate this time factor. You'd be making the assumption that early voting does not affect the ultimate voting shares.
    3. In reality, I think the media's horse-race reporting and speculation about who's voting for whom early is going to affect voting propensity of subgroups. I don't know any of the models incorporate this factor now because these election forecasting models when amplified by media may also affect voting turnover. It seems tough to measure.

    Andrew: I'm reading through all the posts on the 538 models, will post soon. Can you clarify your comment? Your "reengineering" post mostly works with correlations between Trump vote shares, and it supports your argument that if Trump does well in California, he probably also does well in most if not all states.

    ReplyDelete
  3. Yeah I think a lot of the correlation is correlation in polling errors, but of course the polling errors could be different for early vs. day-of voting due to rates of votes being invalidated

    ReplyDelete