Monday, November 28, 2016

The myth of orthogonality

One of the factors that contributed to the punched-in-the-gut feeling that so many people had immediately after the election was the seeming orthogonality of the data.

I'm using orthogonal in the broad rather than technical sense here (though I suspect both might apply) meaning to bring new information into the model. It wasn't just that the poll aggregators (with the partial exception of the outlier 538) were all telling us that the outcome was almost certain; we were also hearing exactly the same thing from pretty much everyone else, sources which supposedly had access to different information and were using a variety of approaches. A partial list included prediction markets, expert analyses, pseudo-exit polls (Slate's ill-fated, badly-thought-out Votcastr), and (from what we can infer) the consensus opinions within the campaigns themselves. All of these converged on exactly the same, completely wrong conclusion.

It is likely to take a great deal of hard work and deep digging to uncover exactly what went wrong here, but we can make some educated guesses:

The other data sources were never all that orthogonal (and possibly never all that good). For instance, even under ideal circumstances the predictive power of the markets was always overstated and overhyped, and presidential elections are nowhere near ideal circumstances.

To make matters worse, whatever orthogonality these other sources once brought to the table had faded to nothing by the time we got to this election. Between their early successes and the ludicrous amount of attention they received, the poll aggregators' predictions increasingly dominated conventional wisdom and became the only input (direct or indirect) that mattered for all the other “independent” sources of information.

I suspect that we reached the point where (if you'll forgive a clumsy phrase) prediction markets and the rest were anti-orthogonal. By providing the illusion of independent confirmation of the flawed polling data and likely voter models, they actually made it more difficult to bring new information in the system. It is entirely possible that better informed (or at least less misinformed) voters might have acted very differently, which suggests that the consequences of this particular failure may have been high indeed.

1 comment:

  1. Yup. As I wrote (http://andrewgelman.com/2016/11/08/election-forecasting-updating-error-ignored-correlations-data-thus-producing-illusory-precision-inferences/), in election forecasting we ignored correlations in some of our data, thus producing illusory precision in our inferences.

    ReplyDelete