Introduction

This week, I’ll return to the concept of probabilistic models, taking a slightly different approach. We’ll look at a binomial version of the logistic regression, and then take a look at some possible extensions. In particular, we’ll take a close look at the beta-binomial model. For now, we’ll only think about the theory of this model, because actually running it with the R packages I tried was going to melt my laptop.

Logistic Models, Again

Last week, I introduced logistic regressions with this equation:

\[Pr(VoteShare = 200000|VoterTurnout = 5000000) = f(\beta_0+\beta_1x_1 + \beta_2 x_2+...)\]

However, a close look at the regresion that I ran reveals that something like this model is not what I ran. Instead, I ran a logistic regression on a binary response variable, if the incumbent won. This week, I’ll actually use a binomial response. Instead of just predicting if the incumbent wins, I predict the probability of any given voter actually turning up the polls and voting for a given candidate. If you think about how elections work, this fits quite well with the underlying process of how an election actually happens.

Just like last week, I’ll use the same \(x\)’s in the regression: average vote share, an indicator for state, and indicator for general period of time, second quarter GDP growth at a national level, and second quarter real disposable income growth at a state level. One key difference from last week is that we can return to a two sided model, seperately estimating results for both incumbent parties and challengers. Because the form of the regression is so similar to last week, I’ll leave the output to the appendix.

Things get more interesting once we get the regression output, and make predictions for this year. Because we have probabilistic output, we can simulate elections. The voting in each state is simulated by a draw of a random variable:

\[Votes_{si} \sim Binomial(p_{si}, VEP_s)\]

The subscripts \(s\) refer to different states, while the subscripts \(i\) refer to incumbency status. We can then simulate a large number of elections, and look at the results. For this post, I simulated 10^{4} elections in total.

First, let’s look at the vote share in one particular state. Because of all of the publicity surrounding tipping point states, let’s look at Pennsylvania.

For all the talk of Pennsylvania possibly deciding the election, it does not look particularly close. We can look at a similar histogram, this time looking at vote margins for Trump, for every state (plus DC).