It’s been nearly three weeks since the election, and it seems we finally have results (even if many do not accept them). It has come time to assess how my model at predicting the election. To start, I’ll remind you of the model that I built. There were four parts:
Estimating turnout via a poisson regression
Estimating vote share for each candidate using a two sided binomial regression
Estimating a national polling error based on polling averages from past data
Converting the information from steps 1-3 into probability distributions, and then making draws to simulate election outcomes. Turnout was distirbuted as a poisson random variable, vote share as multivariate normal (with covariance between states based on a scaled similarity matrix), and national polling error as two stable distributions.
We can look at the map of the average electoral college over 10,000 simulations, and compare it to the actual results.
I incorrectly predicted five states: Iowa, North and South Carolina, Ohio, and Florida. I’m honestly not particularly surprised about South Carolina, given that based on conventional wisdom it was unlikely for Biden to win the state. My incorrect predictions in Iowa and Ohio are tied together - Biden winning both would be an indicator of a very strong performance overall, which did not happen. They are also tied to South Carolina being predicted incorrectly, as South Carolina could only be in play if Biden won in a landslide.
That leaves Florida and North Carolina, which many forecasts missed. This includes models like FiveThirtyEight and The Economist, two of the more sophisticated models out there. In my model, Biden’s decisive wins in those states were a symptom of the overall prediction, a Biden blowout.
By about 10 pm on election night, it was clear that was not going to happen. It was immediately clear that my model was wrong, it was just going to be a question of how much, and if the actual results fell within the predicted range of outcomes. One interesting thing to look at is the distribution of classification accuracies: as it turns out, the model was never exactly right.
In fact, the maximum classification accuracy was 49 out of 51 states (including DC), which strikes me as doing quite poorly for a probabilistic model. One would hope that the model would get the exactly right results at least some of the time, and that was not the case for this model. One thing to note is the relatively large probability mass below 45 states predicted correctly - it suggests there is something causing large groups of states to be wrong all together.
In the end, Biden won the election with 306 electoral votes to Trump’s 232. We can look at the distribution of electoral results predicted by the model, and see where those results fall. The dashed lines are the actual electoral college vote counts for each candidate.