Introduction

This week, I’ll return to the concept of probabilistic models, taking a slightly different approach. We’ll look at a binomial version of the logistic regression, and then take a look at some possible extensions. In particular, we’ll take a close look at the beta-binomial model. For now, we’ll only think about the theory of this model, because actually running it with the R packages I tried was going to melt my laptop.

Logistic Models, Again

Last week, I introduced logistic regressions with this equation:

\[Pr(VoteShare = 200000|VoterTurnout = 5000000) = f(\beta_0+\beta_1x_1 + \beta_2 x_2+...)\]

However, a close look at the regresion that I ran reveals that something like this model is not what I ran. Instead, I ran a logistic regression on a binary response variable, if the incumbent won. This week, I’ll actually use a binomial response. Instead of just predicting if the incumbent wins, I predict the probability of any given voter actually turning up the polls and voting for a given candidate. If you think about how elections work, this fits quite well with the underlying process of how an election actually happens.

Just like last week, I’ll use the same \(x\)’s in the regression: average vote share, an indicator for state, and indicator for general period of time, second quarter GDP growth at a national level, and second quarter real disposable income growth at a state level. One key difference from last week is that we can return to a two sided model, seperately estimating results for both incumbent parties and challengers. Because the form of the regression is so similar to last week, I’ll leave the output to the appendix.

Things get more interesting once we get the regression output, and make predictions for this year. Because we have probabilistic output, we can simulate elections. The voting in each state is simulated by a draw of a random variable:

\[Votes_{si} \sim Binomial(p_{si}, VEP_s)\]

The subscripts \(s\) refer to different states, while the subscripts \(i\) refer to incumbency status. We can then simulate a large number of elections, and look at the results. For this post, I simulated 10^{4} elections in total.

First, let’s look at the vote share in one particular state. Because of all of the publicity surrounding tipping point states, let’s look at Pennsylvania.

For all the talk of Pennsylvania possibly deciding the election, it does not look particularly close. We can look at a similar histogram, this time looking at vote margins for Trump, for every state (plus DC).

While this plot is unfortunately quite difficult to read, a key problem becomes apparent: there is very little variance in the draws from the binomials. This lack of variance leads to nearly binary results, which we can see with another histogram.

So, in practice, this model is no differen than previous iterations. Continuing forward, we need to come up with a way to introduce some more variance into the model.

Possible Solutions

There are two ideas that I can think of to make the model a little more realistic, and hopefully to add some more variance. First, we could introduce correlation between states when we draw from the binomial distributions. There is important intuition here: if the model shows particularly high turnout for a candidate in Wisconsin, there was likely high turnout for the same candidate in Pennsylvania. After some searching, I eventually reached the conclusion that building correlated binomial distributions is a rather difficult statistics question, that I may attempt in the future1. A second possibility is moving away from having a specific probability in the binomial distribution, but rather a distribution. Enter the beta-binomial model.

Beta Binomial

The basic idea of the beta-binomial model is as follows: our outcome is stll distributed2 \(out \sim Binomial(p, n)\), but \(p\) has a distribution as well. Specifically, \(p \sim Beta(\alpha, \beta)\), a beta distribution. The beta and binomial distributions are conjugate, meaning that the math in this model is not too bad3. As it turns out, after skippinig some math, we can write out a compound distribution of our outcome:

\[f(k|n, \alpha, \beta) = \binom{n}{k}\frac{B(k+\alpha, n-k+\beta)}{B(\alpha, \beta)}\]

Where \(B(\alpha, \beta)=\frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}\). In totality, this does not look too different from a binomial distribution, if you think of the \(\Gamma\) function as being the generalization of factorial. While this expression looks rather ugly, we can use it to understand how to use this to estimate an election: if we can find \(\alpha\) and \(\beta\) from our data, then we use draws from this distribution to simulate an election as before4. There are some clever ways to do this that involve reparamatarizing the beta distribution that I may explore soon.

The last question is, does this model make sense for an election? Let’s start with the basic question: does this help us with the variance problem? As it turns out, part of why the beta-binomial model is used is to deal with overdispersion, which seems to be exactly the problem that we’re dealing with.

Next, does it make intuitive sense? In my mind, yes. In each state, not every single person has the same probability of voting for a particular candidate. In fact, people may have very different probabilities of voting for a candidate. Having the beta distribution in the background of the binomial leads to a greater deal of freedom in the model in a reasonable way.

One particularly salient way to think about the beta distribution as modeling probabilities is to first take a simpler example. Imagine you have a big bag of coins, most of which are not fair. Some will come up heads half the time, but some will only come up heads one tenth of the time, and others will come up heads three quarters of the time. Think about a distribution of the coins: you could sort them by probability they come up heads, stacking coins with the same probability on top of each other. The height of each stack tells you the chance that a coin has that particular weight.

In the case of an election, we can repeat the same exercise, just with people instead of coins, and probability of voting for a candidate instead of the chance it comes up heads. In that sense, the beta-binomial model makes a lot of sense, and I’ll explore it in the coming weeks.


  1. This StackOverflow post gave the best suggestion I could find, and even they admit that this is a difficult problem. I was thinking about doing the estimation as a multinomial distribution, to avoid having to deal with the correlation problem in the first place, which may be a viable solution. At the same time, estimating parameters for a multinomial distribution is not a straightforward task either and multinomial values are generated via simulation, which is what I would have to do if I used 51 binomials.

  2. With probability \(p\) of having a “success” in each of the \(n\) trials.

  3. Not too bad means that parts of the problem have closed form solutions.

  4. We still have \(n\), which is the voting eligible population.