West Coast Stat Views (on Observational Epidemiology and more): Rethinking modeling assumptions in 2020

Thursday, October 29, 2020

Rethinking modeling assumptions in 2020

#earlyvote afternoon update 10/25

At least 59 million people have voted in the 2020 general election 🥳 https://t.co/s8K2xFDeSA pic.twitter.com/iLxtwhbwrT
— Michael McDonald (@ElectProject) October 25, 2020

The idea that you should incorporate a correlation matrix into an electoral model is one of those things that just makes sense across the board. It is supported by the data, it is intuitively obvious, and it is easy to justify from first principles.

But here’s the part that bothers me just a little bit. Historically, these models predicted the outcome of a collection of events that happened in different states but mostly at a single point in time simultaneously, election day.

That’s not how elections work in 2020. Different states now have wildly different cadences and rules. Most of the votes might be cast in one state before another even starts the process. Is the likelihood of a candidate outperforming the polls in the first two weeks of October in one state still strongly correlated to the probability in another state on election day? Do certain aspects of the model fare better than others under these new conditions? Would 538’s model handle these changes differently than the Economist model would?

I have absolutely no idea whether or not these are important issues. I am woefully ignorant on the subject, but it seems like an interesting topic for discussion so if anyone better informed than I (which is to say pretty much anyone reading this blog) would care to join in, I would love to hear some opinions.

4 comments:

Andrew GelmanOctober 29, 2020 at 10:18 AM
Mark:

Lots of the correlation refers to correlations between states in polling errors, not to correlations between states in vote swings.
ReplyDelete
Replies
JunkchartsOctober 29, 2020 at 12:02 PM
Have a few thoughts here:
1. I notice that pollsters have reacted to this by asking people whether they have already voted. At least I have seen media reporting Trump/Biden support conditioned on whether they voted. So that should help with modeling.
2. The basic model I don't think needs to incorporate this time factor. You'd be making the assumption that early voting does not affect the ultimate voting shares.
3. In reality, I think the media's horse-race reporting and speculation about who's voting for whom early is going to affect voting propensity of subgroups. I don't know any of the models incorporate this factor now because these election forecasting models when amplified by media may also affect voting turnover. It seems tough to measure.

Andrew: I'm reading through all the posts on the 538 models, will post soon. Can you clarify your comment? Your "reengineering" post mostly works with correlations between Trump vote shares, and it supports your argument that if Trump does well in California, he probably also does well in most if not all states.
ReplyDelete
Replies
DeanOctober 29, 2020 at 12:07 PM
Yeah I think a lot of the correlation is correlation in polling errors, but of course the polling errors could be different for early vs. day-of voting due to rates of votes being invalidated
ReplyDelete
Replies

Add comment