As discussed in the last blog entry we briefly discussed that a fundamentals model cannot account for everything when predicting an election, especially at the state level. In fact, using fundamentals seems like a rather indirect way of predicting an election given that polling exists; in a sense, polling is attempting to predict an election by asking a representative sample of people what they think about the election.

Many of the most sophisticated election models use polls as their backbone, like FiveThirtyEight and The Economist. In the lab section for the week, we looked at models that use polls at a national level, but as we’ve discussed before, the United States president is decided by the states and the electoral college.

For the duration of this blog post, I will be working with three seperate but related models. The first is a fundamentals based model, that looks very similar to models previously discussed on this blog. The second is a polls only model, that relies solely on state polling to predict vote shares. The third is what I refer to as the “polls-plus” model, that incorporates both the fundamentals and the polling data. In all three cases, the model is two sided, meaning it indepedently predicts outcomes for the incumbent and challenger^{1}. It also shifts to using raw vote share, rather than two party vote share, as that is how the majority of polls are conducted. Output for omitted regressions can be found in the appendix.

To start, we need a basic fundamentals model for comparison. We’ll use a variant of the model from last week, this time using only on second quarter real disposable income growth at a state level, second quarter national gdp data, while controlling for state and general era^{2}. Data comes from the “Quarterly Personal Income By State.” from the Bureau of Economic Analysis.

Similar to last week, the R-squared value of from these regressions is quite low. The incumbent fundamentals model has an R-squared of 0.175, while for the challenger it is 0.163, both relatively close to the low values from pervious fundamentals model on this blog.

The mean squared error for the incumbents is 81.855, and for the challenger it is 71.281, which is consistent with the low R-squared values. For out of sample fit, we find that the average absolute error on the vote margin^{3}, across both models is 13.079.

For the polls only model, I work with state level polling from 1972 onward. In many cases, polling is relatively sparse; especially earlier on, not every state has polls. In addition, because there are many polls conducted, I used an average of the polling averages as a single input. This average is calculated by taking historical polling averages produced between six months from the election up the current week and taking the mean^{4}. That value is then regressed on the incumbent or challenger vote share. Becuase this model is new, we will take a look at the full function.

We can first look at the linear regression for the incumbent.

```
##
## Call:
## lm(formula = inc_pv ~ avg_poll, data = reg_df %>% filter(year <
## 2020, incumbent_party == TRUE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3626 -3.1061 -0.5668 2.3519 20.4655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.09468 1.04303 5.843 9.59e-09 ***
## avg_poll 0.96636 0.02358 40.981 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.664 on 469 degrees of freedom
## Multiple R-squared: 0.7817, Adjusted R-squared: 0.7812
## F-statistic: 1679 on 1 and 469 DF, p-value: < 2.2e-16
```

A few key pieces of information jump out. First, the R-squared value of 0.782 is much, much larger than the fundamentals model, suggesting a much better fit. Secondly, the coefficient on `avg_poll`

is 0.966, it suggests a nearly one to one relationship between the average vote share; if the average polling value for an incumbent increased by 1 percentage point, their expected vote share would increase by 0.966 percentage points. Then, we can look at the challenger regression.

```
##
## Call:
## lm(formula = chl_pv ~ avg_poll, data = reg_df %>% filter(year <
## 2020, incumbent_party == FALSE))
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.094 -2.935 -0.229 2.605 15.293
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.02625 1.10526 6.357 4.88e-10 ***
## avg_poll 0.95593 0.02641 36.189 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.75 on 469 degrees of freedom
## Multiple R-squared: 0.7363, Adjusted R-squared: 0.7358
## F-statistic: 1310 on 1 and 469 DF, p-value: < 2.2e-16
```

The results are relativley similar: the model has an R-squared of 0.736, and the coefficient on the average poll value is `0.955`

. For all the concern about polls being accurate, this model suggests they are in fact quite predictive, even more than 5 weeks from the election^{5}.

In terms of in sample fit, the mean squared error for the incumbent is 21.659, and for the challenger it is 22.464. These results are clearly much better than the fundamentals model in terms of in sample fit.

To examine out of sample fit, we look at the average absolute error on the vote margin, which is 5.844. This value is significatly lower than than for the fundamentals model, continuing with the pattern that the polls are much more predictive than fundamentals.

For the polls plus model, it takes the fundamentals model and adds the historical polling average as another predictor. Because this model has the most robust set of inputs, one could imagine that it is also the most accurate model. Due to this model’s length and relative similarity to previous models, full details can be found in the technical appendix.

Unsurprisingly, the R-squared for both the incumbent and challenger is higher, at 0.815 for the incumbent and 0.796 for the challenger^{6}. Interestingly, for the incumbent, both the coefficients on GDP and real disposable income are negative, while the coefficient on polling average is almost exactly 1.00. This suggests that for incumbents, the relationship between the polling average and actual results in each state is almost exactly one to one. For the challenger, the coefficient on GDP is negative, and the coefficient on vote share is 0.983, suggesting that a similar relationship holds.

In terms of in sample fit, we find the the regression for the incumbent has a mean squared error of 18.375, while it is 17.404 for the challenger. Consistent with having the lowest R-squared, the polls plus model appears to fit the data best in sample.

To examine out of sample fit, we look at the average absolute error on the vote margin, which is 5.407. This value is the lowest of the three models, again following the pattern that the polls plus model fits the data best.

After seeing the varrying degree of accuracy of these three models, it is time to move to prediction. We have second quarter GDP and real disposable income, so to predict the 2020 election we just need polling averages. I used polling averages from FiveThirtyEight as polling data input^{7}. Using this data, we can predict the outcomes of the 2020 election. We can look at the results from each model in turn.

In stark contrast to last week, the fundamentals model predicts a landslide for Trump. This change is directly due to the release of the second quarter real disposable income growth by state for the second quarter. Due to the CARES Act, stimulus payments greatly increased many people’s incomes which tips the scales towards Trump. In this model, Trump is expected to win in a landslide, with 512 electoral votes, while Biden wins 26.

This electoral map looks much reasonable, and is in line with predictions from experts. Biden wins the electoral college comfortably, with 352 electoral votes while Trump wins 186. For Biden, this map looks like a relatively feasible to victory: he wins both Michigan and Pennsylvania, two key historical tipping point states. Biden also wins North Carolina, a state that FiveThirtyEight projects him to win by 0.06 percentage points, and The Economist projects Biden to win by 1.0 percentage points.