It appears that we have some resolution with the election, and the race has been called for Joe Biden. As of right now, Joe Biden is on track to win 306 electoral votes. If current counts hold, I will have predicted Ohio, Iowa, the Carolinas, and Florida incorrectly. I’ll be diving into what went wrong with this model, and why over the next couple of weeks.

Introduction

At the time of writing, we are two days away from the 2020 presidential election between Donald Trump and Joe Biden. Many believe that this may be the most important election in the history of the United States. The question on everyone’s mind is the same: who is going to win? I’ll discuss the model I have built, and then show my prediction. There are four parts to the model:

  1. Estimating turnout in each state.

  2. Estimating vote share for each candidate in each state.

  3. Estimating national polling error.

  4. Simulating the election based on the estimated parameters.

I’ll go through each section in turn, examine the results, and then decide what I think of this model.

The Basic Structure

The end goal of this model is to estimate the number of voters that for each candidate in each state. Given that I want to do this with a probibalistic model, the natrual choice to do that is with draws from binomial random variables:

\[\textrm{Votes}_{ic}\sim\textrm{Bin}(n_{i}, p_{ic})\] The subscripts \(i\) denotes each state, and the subscript \(c\) denotes each candidate. For both Trump and Biden, each simulation of the election is a draw from a set of 102 random variables: two binomial distributions for each state, with 51 different turnout values and 102 different probability values.Simulating these draws is exactly step four in the process that I explained in the introduction. In my model, both \(n_i\) and \(p_{ic}\) are also random variables, making this a sort of hierarchical model. Let’s being with looking at how turnout is estimated.

Estimating Turnout

I estimate turnout using a pooled model, across all states and across every election since 1992 After spending some time looking at the data, I settled on using a poisson regression to estimate turnout. Poisson models are a form of generalized linear models, based on the poisson distribution. Because the data is a set of discrete counts, this seemed like a reasonable choice to me1. I used a great number of covariates to estimate the turnout model. This includes demographic data, the polling margin in the state2, lagged variables for the previous election’s turnout and voting margin, and a state fixed effect indicator. For a full desription of the data, see Appendix: Data at the end of this post. We can look at the full output from the model:

## 
## Call:
## glm(formula = total ~ last_vote_margin + poll_margin + Black + 
##     Hispanic + Asian + White + Male + age20 + age3045 + age4565 + 
##     last_turnout + state, family = "poisson", data = turnout_scaled %>% 
##     filter(year < 2020))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -443.77  -123.94     3.72   122.53   460.54  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       1.295e+01  1.581e-03 8191.23   <2e-16 ***
## last_vote_margin -5.570e-02  8.545e-05 -651.81   <2e-16 ***
## poll_margin       4.040e-01  6.211e-04  650.54   <2e-16 ***
## Black             3.908e-01  7.021e-04  556.58   <2e-16 ***
## Hispanic         -2.587e-01  7.425e-04 -348.47   <2e-16 ***
## Asian             5.088e-01  1.305e-03  389.86   <2e-16 ***
## White            -2.752e-01  1.024e-03 -268.78   <2e-16 ***
## Male             -1.139e-01  3.463e-04 -328.92   <2e-16 ***
## age20            -3.277e-02  2.585e-04 -126.80   <2e-16 ***
## age3045           4.824e-02  1.653e-04  291.94   <2e-16 ***
## age4565          -3.644e-02  1.680e-04 -216.85   <2e-16 ***
## last_turnout      2.967e-01  4.702e-04  631.11   <2e-16 ***
## stateAL           7.985e-01  2.253e-03  354.38   <2e-16 ***
## stateAR           1.004e+00  1.911e-03  525.58   <2e-16 ***
## stateAZ           2.314e+00  2.247e-03 1029.84   <2e-16 ***
## stateCA           2.054e+00  2.675e-03  767.72   <2e-16 ***
## stateCO           2.195e+00  1.735e-03 1265.15   <2e-16 ***
## stateCT           1.495e+00  1.852e-03  807.11   <2e-16 ***
## stateDC          -1.689e+00  4.003e-03 -421.83   <2e-16 ***
## stateDE          -3.156e-01  2.136e-03 -147.75   <2e-16 ***
## stateFL           2.378e+00  1.945e-03 1222.99   <2e-16 ***
## stateGA           8.744e-01  2.108e-03  414.69   <2e-16 ***
## stateHI          -3.613e+00  6.673e-03 -541.47   <2e-16 ***
## stateIA           1.947e+00  1.894e-03 1028.43   <2e-16 ***
## stateID           1.335e+00  1.867e-03  715.27   <2e-16 ***
## stateIL           1.977e+00  1.664e-03 1187.82   <2e-16 ***
## stateIN           1.939e+00  1.805e-03 1074.48   <2e-16 ***
## stateKS           1.533e+00  1.754e-03  873.86   <2e-16 ***
## stateKY           1.706e+00  1.877e-03  909.04   <2e-16 ***
## stateLA           5.204e-01  2.422e-03  214.82   <2e-16 ***
## stateMA           2.069e+00  1.944e-03 1064.08   <2e-16 ***
## stateMD           5.749e-01  2.114e-03  271.95   <2e-16 ***
## stateME           1.352e+00  1.998e-03  676.44   <2e-16 ***
## stateMI           1.985e+00  1.700e-03 1167.64   <2e-16 ***
## stateMN           2.155e+00  1.694e-03 1272.04   <2e-16 ***
## stateMO           1.810e+00  1.822e-03  993.42   <2e-16 ***
## stateMS          -1.236e-01  2.828e-03  -43.69   <2e-16 ***
## stateMT           1.024e+00  1.657e-03  618.03   <2e-16 ***
## stateNC           1.462e+00  1.858e-03  787.27   <2e-16 ***
## stateND           7.494e-01  1.721e-03  435.49   <2e-16 ***
## stateNE           1.299e+00  1.846e-03  703.50   <2e-16 ***
## stateNH           1.155e+00  1.893e-03  610.01   <2e-16 ***
## stateNJ           1.593e+00  1.722e-03  925.10   <2e-16 ***
## stateNM           1.745e+00  3.842e-03  454.11   <2e-16 ***
## stateNV           1.045e+00  1.715e-03  609.68   <2e-16 ***
## stateNY           1.869e+00  1.890e-03  989.31   <2e-16 ***
## stateOH           2.121e+00  1.883e-03 1126.47   <2e-16 ***
## stateOK           1.483e+00  1.685e-03  879.97   <2e-16 ***
## stateOR           1.980e+00  1.699e-03 1165.46   <2e-16 ***
## statePA           2.225e+00  1.894e-03 1174.67   <2e-16 ***
## stateRI           7.081e-01  2.230e-03  317.52   <2e-16 ***
## stateSC           5.790e-01  2.322e-03  249.35   <2e-16 ***
## stateSD           7.291e-01  1.749e-03  416.99   <2e-16 ***
## stateTN           1.456e+00  1.867e-03  779.54   <2e-16 ***
## stateTX           2.470e+00  2.292e-03 1077.71   <2e-16 ***
## stateUT           1.713e+00  2.182e-03  785.10   <2e-16 ***
## stateVA           1.297e+00  1.641e-03  789.89   <2e-16 ***
## stateVT           6.315e-01  2.068e-03  305.40   <2e-16 ***
## stateWA           1.998e+00  1.573e-03 1270.18   <2e-16 ***
## stateWI           2.224e+00  1.673e-03 1329.34   <2e-16 ***
## stateWV           1.268e+00  1.970e-03  643.69   <2e-16 ***
## stateWY           5.211e-01  1.848e-03  281.93   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 628278481  on 324  degrees of freedom
## Residual deviance:   8371435  on 263  degrees of freedom
##   (32 observations deleted due to missingness)
## AIC: 8376793
## 
## Number of Fisher Scoring iterations: 4

The key numbers for in sample fit are the null and residual deviances, at the very bottom of the funciton call. The residual deviance being significantly smaller, 628278481 versus 8371435, suggests that there is significant improvement against the null3 model. Becuase of scaling discussed in the Appendix: Data section, the interpretation of coefficients is not particularly enlightening. One thing to remember is that becuase this is a poisson regression, the signs of the coefficients indicate which direction the log of the total turnout would move. The coefficient on the poll margin is positive, which actually goes against some research - it suggests that a greater poll margin leads to a higher turnout. It could be that higher poll margins are the result of more people deciding to vote for a particular party, meaning that turnout is influencing the margin, suggesting that the regression may not properly capture cause and effect.

We can also look at out of sample validation. In this case, I conducted leave one out validition by year, and then calculated the sum of the squares of the residuals in each state as a measure of fit.

Sum of Squares of Residuals Year Preidcted
8.260468e+12 1992
1.246439e+13 1996
1.052389e+13 2000
1.405148e+12 2004
6.479571e+12 2008
3.681644e+12 2012
9.025660e+12 2016

While these numbers do seem quite high, it is important to remember that these values are squared. Numbers that are on the order of ten to the twelfth represent being off by millions of votes, which in an election of roughly 120 million votes is quite good The final thing to do with this model is predict the turnout for each state in 2020, which we will use later when we simulate the election.

Estimating Vote Share

Estimating vote share happens roughly the same way estimating turnout, just with a different generalized linear model. Similar to many of the models I have used throughout this blog, I am using a two sided model based on party incumbency status4. I still used a pooled model across all states and years since the 1992 election.

For both the incumbent and the challenger models, I use a binomial regression, estimating the fraction of the total votes that each candidate will win. The regression uses demographic data, polling data, previous election results, economic data, along with party and state fixed effects. We can take a look at both the incumbent and challenger models. First, the incumbent model:

## 
## Call:
## glm(formula = cbind(inc_votes, total - inc_votes) ~ rdi_q2 + 
##     gdp + state + avg_poll + Black + Hispanic + Asian + White + 
##     Male + age20 + age3045 + age4565 + last_vote_margin + party, 
##     family = binomial, data = full_votes %>% filter(year < 2020, 
##         incumbent_party == TRUE))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -207.335   -31.862     0.452    33.982   205.117  
## 
## Coefficients:
##                    Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)      -1.965e+00  2.759e-03 -712.394  < 2e-16 ***
## rdi_q2            5.055e-03  9.465e-05   53.406  < 2e-16 ***
## gdp               4.751e-02  1.575e-04  301.562  < 2e-16 ***
## stateAL          -1.854e-01  3.226e-03  -57.464  < 2e-16 ***
## stateAR          -1.578e-01  3.101e-03  -50.884  < 2e-16 ***
## stateAZ           2.535e-02  3.391e-03    7.475 7.72e-14 ***
## stateCA          -5.681e-02  3.233e-03  -17.570  < 2e-16 ***
## stateCO           3.198e-02  2.838e-03   11.268  < 2e-16 ***
## stateCT          -6.234e-02  3.146e-03  -19.818  < 2e-16 ***
## stateDC           1.621e-01  7.239e-03   22.392  < 2e-16 ***
## stateDE          -1.170e-01  3.355e-03  -34.887  < 2e-16 ***
## stateFL          -5.842e-02  3.530e-03  -16.550  < 2e-16 ***
## stateGA          -1.784e-01  2.975e-03  -59.949  < 2e-16 ***
## stateHI          -7.962e-01  8.487e-03  -93.810  < 2e-16 ***
## stateIA          -9.766e-02  3.325e-03  -29.373  < 2e-16 ***
## stateID          -1.158e-01  3.279e-03  -35.306  < 2e-16 ***
## stateIL          -9.170e-02  2.881e-03  -31.834  < 2e-16 ***
## stateIN          -1.293e-01  3.045e-03  -42.479  < 2e-16 ***
## stateKS          -1.616e-01  3.036e-03  -53.223  < 2e-16 ***
## stateKY          -1.520e-01  3.165e-03  -48.029  < 2e-16 ***
## stateLA          -1.334e-01  3.263e-03  -40.894  < 2e-16 ***
## stateMA          -1.985e-01  3.251e-03  -61.044  < 2e-16 ***
## stateMD          -2.509e-01  3.153e-03  -79.573  < 2e-16 ***
## stateME          -9.035e-02  3.542e-03  -25.506  < 2e-16 ***
## stateMI          -9.126e-02  2.843e-03  -32.096  < 2e-16 ***
## stateMN          -1.241e-01  2.881e-03  -43.067  < 2e-16 ***
## stateMO          -1.632e-01  3.091e-03  -52.803  < 2e-16 ***
## stateMS          -2.079e-01  3.584e-03  -58.005  < 2e-16 ***
## stateMT          -8.205e-02  3.071e-03  -26.713  < 2e-16 ***
## stateNC          -1.596e-01  2.942e-03  -54.246  < 2e-16 ***
## stateND          -7.464e-02  3.188e-03  -23.412  < 2e-16 ***
## stateNE          -1.362e-01  3.172e-03  -42.923  < 2e-16 ***
## stateNH          -4.071e-02  3.317e-03  -12.273  < 2e-16 ***
## stateNJ          -1.374e-02  2.985e-03   -4.603 4.17e-06 ***
## stateNM           7.810e-02  4.897e-03   15.950  < 2e-16 ***
## stateNV          -2.362e-02  2.712e-03   -8.708  < 2e-16 ***
## stateNY          -2.682e-01  3.175e-03  -84.468  < 2e-16 ***
## stateOH          -1.647e-01  3.084e-03  -53.415  < 2e-16 ***
## stateOK          -1.782e-01  2.877e-03  -61.948  < 2e-16 ***
## stateOR          -5.122e-02  2.962e-03  -17.294  < 2e-16 ***
## statePA          -1.374e-01  3.188e-03  -43.106  < 2e-16 ***
## stateRI          -7.388e-02  3.692e-03  -20.010  < 2e-16 ***
## stateSC          -2.318e-01  3.198e-03  -72.458  < 2e-16 ***
## stateSD          -6.394e-02  3.163e-03  -20.216  < 2e-16 ***
## stateTN          -1.966e-01  3.009e-03  -65.346  < 2e-16 ***
## stateTX           2.490e-02  3.713e-03    6.708 1.98e-11 ***
## stateUT          -2.157e-01  3.482e-03  -61.944  < 2e-16 ***
## stateVA          -1.461e-01  2.669e-03  -54.754  < 2e-16 ***
## stateVT          -1.571e-01  3.714e-03  -42.292  < 2e-16 ***
## stateWA          -1.208e-01  2.580e-03  -46.825  < 2e-16 ***
## stateWI          -9.412e-02  2.942e-03  -31.991  < 2e-16 ***
## stateWV          -7.054e-02  3.482e-03  -20.256  < 2e-16 ***
## stateWY          -1.308e-02  3.536e-03   -3.699 0.000216 ***
## avg_poll          4.347e+00  1.272e-03 3417.178  < 2e-16 ***
## Black             1.853e-02  7.166e-04   25.851  < 2e-16 ***
## Hispanic         -4.198e-02  1.023e-03  -41.017  < 2e-16 ***
## Asian             1.269e-01  1.490e-03   85.156  < 2e-16 ***
## White             3.057e-03  1.577e-03    1.938 0.052569 .  
## Male             -4.947e-02  4.588e-04 -107.833  < 2e-16 ***
## age20             2.957e-02  3.864e-04   76.519  < 2e-16 ***
## age3045          -7.568e-03  3.272e-04  -23.132  < 2e-16 ***
## age4565           7.222e-03  3.357e-04   21.512  < 2e-16 ***
## last_vote_margin  1.411e-02  1.932e-04   73.043  < 2e-16 ***
## partyrepublican  -3.777e-03  1.579e-04  -23.916  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 22644500  on 324  degrees of freedom
## Residual deviance:  1032104  on 261  degrees of freedom
## AIC: 1037001
## 
## Number of Fisher Scoring iterations: 3

We can also look at the challenger model:

## 
## Call:
## glm(formula = cbind(chl_votes, total - chl_votes) ~ rdi_q2 + 
##     gdp + state + avg_poll + Black + Hispanic + Asian + White + 
##     Male + age20 + age3045 + age4565 + last_vote_margin + party, 
##     family = binomial, data = full_votes %>% filter(year < 2020, 
##         incumbent_party == FALSE))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -271.370   -40.194     2.738    37.709   201.321  
## 
## Coefficients:
##                    Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)      -2.342e+00  2.719e-03 -861.459  < 2e-16 ***
## rdi_q2           -6.305e-03  9.533e-05  -66.135  < 2e-16 ***
## gdp              -3.723e-02  1.524e-04 -244.309  < 2e-16 ***
## stateAL           1.023e+00  3.197e-03  320.068  < 2e-16 ***
## stateAR           5.654e-01  3.065e-03  184.456  < 2e-16 ***
## stateAZ          -4.461e-01  3.385e-03 -131.800  < 2e-16 ***
## stateCA          -7.895e-01  3.263e-03 -241.991  < 2e-16 ***
## stateCO          -2.689e-01  2.792e-03  -96.294  < 2e-16 ***
## stateCT           1.487e-01  3.103e-03   47.922  < 2e-16 ***
## stateDC          -3.357e-02  9.859e-03   -3.405 0.000662 ***
## stateDE           6.807e-01  3.327e-03  204.594  < 2e-16 ***
## stateFL           4.524e-02  3.517e-03   12.862  < 2e-16 ***
## stateGA           1.014e+00  2.952e-03  343.635  < 2e-16 ***
## stateHI          -6.106e-01  8.411e-03  -72.599  < 2e-16 ***
## stateIA           1.829e-01  3.251e-03   56.270  < 2e-16 ***
## stateID           2.176e-02  3.260e-03    6.673 2.51e-11 ***
## stateIL           2.862e-01  2.848e-03  100.475  < 2e-16 ***
## stateIN           4.032e-01  2.990e-03  134.848  < 2e-16 ***
## stateKS           1.304e-01  2.993e-03   43.587  < 2e-16 ***
## stateKY           4.395e-01  3.101e-03  141.720  < 2e-16 ***
## stateLA           1.274e+00  3.244e-03  392.765  < 2e-16 ***
## stateMA           1.515e-01  3.205e-03   47.263  < 2e-16 ***
## stateMD           1.005e+00  3.118e-03  322.349  < 2e-16 ***
## stateME           2.063e-01  3.479e-03   59.299  < 2e-16 ***
## stateMI           5.964e-01  2.795e-03  213.382  < 2e-16 ***
## stateMN           2.109e-01  2.806e-03   75.148  < 2e-16 ***
## stateMO           5.119e-01  3.032e-03  168.850  < 2e-16 ***
## stateMS           1.415e+00  3.565e-03  396.982  < 2e-16 ***
## stateMT           1.560e-01  3.013e-03   51.764  < 2e-16 ***
## stateNC           7.867e-01  2.907e-03  270.648  < 2e-16 ***
## stateND           1.614e-01  3.135e-03   51.484  < 2e-16 ***
## stateNE           1.316e-01  3.134e-03   41.981  < 2e-16 ***
## stateNH           1.598e-01  3.243e-03   49.276  < 2e-16 ***
## stateNJ           1.461e-01  2.956e-03   49.442  < 2e-16 ***
## stateNM          -9.707e-01  4.922e-03 -197.239  < 2e-16 ***
## stateNV          -3.158e-01  2.714e-03 -116.387  < 2e-16 ***
## stateNY           1.922e-02  3.158e-03    6.086 1.16e-09 ***
## stateOH           5.232e-01  3.021e-03  173.187  < 2e-16 ***
## stateOK           2.760e-01  2.839e-03   97.223  < 2e-16 ***
## stateOR          -5.718e-02  2.913e-03  -19.632  < 2e-16 ***
## statePA           4.676e-01  3.131e-03  149.364  < 2e-16 ***
## stateRI           1.191e-01  3.665e-03   32.487  < 2e-16 ***
## stateSC           1.150e+00  3.172e-03  362.467  < 2e-16 ***
## stateSD           1.741e-01  3.096e-03   56.240  < 2e-16 ***
## stateTN           7.769e-01  2.964e-03  262.155  < 2e-16 ***
## stateTX          -4.870e-01  3.724e-03 -130.767  < 2e-16 ***
## stateUT          -8.295e-02  3.460e-03  -23.975  < 2e-16 ***
## stateVA           6.481e-01  2.630e-03  246.447  < 2e-16 ***
## stateVT           2.219e-01  3.668e-03   60.495  < 2e-16 ***
## stateWA          -6.884e-02  2.540e-03  -27.106  < 2e-16 ***
## stateWI           2.497e-01  2.870e-03   86.995  < 2e-16 ***
## stateWV           3.522e-01  3.433e-03  102.593  < 2e-16 ***
## stateWY          -6.133e-02  3.491e-03  -17.570  < 2e-16 ***
## avg_poll          4.509e+00  1.362e-03 3311.292  < 2e-16 ***
## Black            -2.727e-01  7.082e-04 -385.020  < 2e-16 ***
## Hispanic          3.020e-01  1.053e-03  286.786  < 2e-16 ***
## Asian             1.143e-01  1.524e-03   74.979  < 2e-16 ***
## White             4.211e-02  1.591e-03   26.466  < 2e-16 ***
## Male              5.686e-02  4.530e-04  125.506  < 2e-16 ***
## age20            -3.036e-02  3.800e-04  -79.903  < 2e-16 ***
## age3045          -2.662e-02  3.244e-04  -82.053  < 2e-16 ***
## age4565          -3.531e-02  3.365e-04 -104.933  < 2e-16 ***
## last_vote_margin  2.320e-02  2.007e-04  115.567  < 2e-16 ***
## partyrepublican   6.894e-02  1.599e-04  431.082  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 22636314  on 324  degrees of freedom
## Residual deviance:  1257654  on 261  degrees of freedom
## AIC: 1262550
## 
## Number of Fisher Scoring iterations: 3

Again, for both models, the residual deviances are orders of magnitude greater than the null deviances. One thing to note is that the incumbent model seems to have a better fit in sample than the challenger model, with a null deviance of 1032104 against 1257654. Unsurprisingly, the coefficient on the polling average is large and positive, relative to other coefficients. Interestingly, the economic data coefficients are positive in the incumbent model, but negative in the challenger model, which makes some intuitive sense. An incumbent party with a good economy is likely to have better success with voters5.

For out of sample fit, we can look at the fraction of states that the model predicts correctly using leave one out validation based on year.

Year State Prediction Accuracy
1992 0.9565217
1996 0.9400000
2000 0.9183673
2004 1.0000000
2008 0.9600000
2012 0.9189189
2016 0.9200000

For the most part, the model is quite accurate. It misses a few states in some elections. As dicussed last week, a very similar model misses predictions in swing states quite frequently. This model would have incorrectly called the 2016 election for Hillary Clinton. These incorrect predictions in part stem from reliance on polling data, without accounting for variances. Therefore, we need to introduce some uncertainty into the model.

National Polling Error

Much has been made of the possibility of national polling error. In 2016, a roughly 3 point national polling error meant that many were blindsided by Trump’s narrow victory in a number of midwest states. To simulate that, I calculated the mean and variance for polling errors for both Democrats and Republicans in elections since 1992. Interestingly, both parties tend to outperform their polls, by 2.40 and 2.11 points for Democrats and Republicans respectively.

Simulating Elections

Returning to the basic framework of the model, we have:

\[\textrm{Votes}_{ic}\sim\textrm{Bin}(n_{i}, p_{ic})\]

Based on our predictions, we can write down \(n_i\) and \(p_{ic}\) with distributions as well:

\[n_i \sim \textrm{Pois}(\lambda_i)\] \[p_{ic} \sim \textrm{Beta}(\alpha_{ic}, \beta_{ic})\] In this case, \(\lambda_i\) are the predicted mean turnout in each state. As previously, we can re-paramaterize \(\alpha_{ic}\) and \(\beta_{ic}\) and instead input the mean and variance. One thing to note is that the \(\alpha\) and \(\beta\) values are quite large, meaning that we can approximate the beta distributions with high accuracy using a normal distribution. This is key, as it gives an easy way to account for correlation between states: the multivariate normal distribution.

Instead of drawing 102 values one at a time for each vote share for each candidate, we can instead draw two sets of 51. The mean of each value will be the predicted vote share from the vote share models previously discussed. We also need a covariance matrix, which explains how vote shares in different states are related to one another.

We can calculate a covariance model in two steps: first, building a similarity matrix that tells us how similar each state is to one another, using all of the non-categorical data from the vote share model. The key detail of this similarity matrix is that values are between -1 and 1. Values near 1 indicate high similarity (or correlation), and negative values indicate that states are very dissimilar. We can then scale this makeshift correlation matrix by standard deviations in the polls over the past several weeks to get a covariance matrix. The standard deviations in the polling averages that I use are quite small, suggesting a very stable race, and therefore a relatively small amount of uncertainty6.

The final step is the national polling error, which I draw from two independent stable distributions, one for Biden and one for Trump. I intentionally picked stable distributions because they are “fat-tailed,” meaning that events farther from the mean, holding variance fixed, happen more frequently than in a normal distribution. This choice is meant to introduce a potentially large amount of polling error, and therefore variance into the model. Something to note is that these terms are independent for each candidate: both Biden and Trump could have polling errors in their favor in this model7.

One thing to note is that in a sense, including both variance at the state and national level is double counting variance. Variance in a model at the national level is a direct function of variance at the state level. However, in the uncertain times brought on by COVID-19 and one candidate actively trying to discredit the election, I decided that increasing the variance seems reasonable. I’ll denote this variable \(s_{c}\). We can then re-write the basic structure of the model as:

\[\textrm{Votes}_{ic}\sim\textrm{Bin}(n_{i}, p_{ic}+s_c)\]

This leaves us with a complete model, from which we can draw simulated values in order:

  1. Draw the vote shares from the multivariate normal distributions for each candidate.

  2. Draw the turnout values for each state from the poisson distributions.

  3. Draw the national polling error for each candidate and add to the vote shares.

  4. Draw from the binomial in each state to get the votes in each state.

We can finally look at the results from the model over 10,000 draws.

Results

We can first look at the average electoral map based on the average vote shares in each state.

We can also look at the actual vote shares:

State Trump Popular Vote Biden Popular Vote
AK 0.5251388 0.4748612
AL 0.5783282 0.4216718
AR 0.5752348 0.4247652
AZ 0.4725368 0.5274632
CA 0.3484660 0.6515340
CO 0.4244236 0.5755764
CT 0.3607708 0.6392292
DC 0.2794655 0.7205345
DE 0.3660884 0.6339116
FL 0.4580901 0.5419099
GA 0.4839587 0.5160413
HI 0.3438294 0.6561706
IA 0.4850483 0.5149517
ID 0.5707706 0.4292294
IL 0.3981593 0.6018407
IN 0.5359081 0.4640919
KS 0.5188326 0.4811674
KY 0.5726321 0.4273679
LA 0.5549075 0.4450925
MA 0.3055022 0.6944978
MD 0.3163091 0.6836909
ME 0.4166781 0.5833219
MI 0.4485951 0.5514049
MN 0.4442557 0.5557443
MO 0.5170086 0.4829914
MS 0.5560577 0.4439423
MT 0.5187822 0.4812178
NC 0.4654500 0.5345500
ND 0.5820149 0.4179851
NE 0.5096579 0.4903421
NH 0.4354918 0.5645082
NJ 0.3959544 0.6040456
NM 0.4186697 0.5813303
NV 0.4576305 0.5423695
NY 0.3306714 0.6693286
OH 0.4926390 0.5073610
OK 0.5886615 0.4113385
OR 0.3915786 0.6084214
PA 0.4472588 0.5527412
RI 0.3367492 0.6632508
SC 0.4981731 0.5018269
SD 0.5561404 0.4438596
TN 0.5354477 0.4645523
TX 0.5022584 0.4977416
UT 0.5336834 0.4663166
VA 0.4167673 0.5832327
VT 0.3134606 0.6865394
WA 0.3687515 0.6312485
WI 0.4461054 0.5538946
WV 0.6036859 0.3963141
WY 0.6663434 0.3336566

This model shows a blowout for Biden in a number of key states: he is up by more than ten points in Michigan, Florida, and Pennsylvannia. He also narrowly wins South Carolina, and nearly pulls out a victory in Texas. Overall, Biden wins 389.5774 electoral votes on average, with Trump picking up the remaining 148.4226. Looking at the distribution of the electoral college outcomes fits exactly with this theme

We can see that there is very, very little overlap in the electoral college results. In fact, this model predicts Biden victory 9989 out of 10,000 simulations, while Trump wins the remaining 11. As extreme as this result is, it’s not totally out of line with The Economist’s prediction. We can also look at the plot of the popular vote shares, and we can again see very little overlap.

This plot shows Biden with a commanding lead in the national popular vote, winning above 50% of the vote in essentially every contest. There is an electoral college and popular vote split in 0 of the 10,000 contests, meaning that Trump only wins when he happens to win the popular vote. Texas and South Carolina coming down to razor thin margins fits with the heavily Democrat leaning national enviornment. However, in some senses, looking at the average election outcomes is deceptive. A number of races in battleground states are actually predicted to be quite close.