Nate Silver is a prediction guru (or perhaps a witch). He compiles data from polling results, that he weights by sample size and measures of historical reliability, to predict the winner of the US presidential election. He calculates the probability that the candidates will win each of the 50 states (and Washington D.C.), thereby determining the probability that each candidate will win a majority of votes in the electoral college, and hence become President of the United States of America.*
Nate Silver received some heat when he predicted Barack Obama was likely (although no certainty) to win the 2012 election. A week or so before the election, he calculated that the probability of Obama winning was ~75%. Some commentators stated the race was tighter than that, perhaps confusing Silver’s predicted chance of winning with the predicted share of the popular vote.
Nevertheless, Silver’s predictions look like falling out as “expected”. The correspondence between the projected outcome based on counted ballots and his prediction is striking, as shown in the maps below.
However, it should be borne in mind that Nate Silver’s predictions are probabilistic. For example, some states, such as Florida (predicted probability of Obama winning was 0.503) and North Carolina (predicted probability of Romney winning was 0.744), were not predicted to fall to the favoured candidate with certainty.
If the probabilities were calculated accurately, we would expect that the favoured candidate (i.e., the candidate with the highest probability of winning that state) might lose a state or two.
To illustrate this, think of dice. Let’s say I am rolling a six-sided die. The probability of rolling a 1 is 1/6, and the probability of rolling a number that is not 1 (the numbers 2-6) is 5/6. Think of it as a two horse race between the number 1 and the other numbers. If I were to predict the most likely outcome in a single roll of the die, the favourite would be the numbers 2-6. The number 1 would be the underdog. However, over a large number of rolls of the die, I would start to think I had calculated the probabilities incorrectly if I never rolled a one – the underdog should win sometimes. For example, the probability of getting no 1’s in 24 rolls is only:
In this case, I would start getting suspicious that the calculated probability for rolling a 1 was wrong if the underdog continued to lose after that many rolls. In fact, we would expect the underdog should win four times out of the 24 rolls (24/6 = 4).
So, do Nate Silver’s predictions stack up? Are they somehow “too good to be true”? Is he a witch? Well, we can treat the outcome of the favourite losing each state as a random event (equivalent to rolling a particular number on the die), with the probability of that event given by Nate Silver’s predictions. Then we can determine the probability that the favourite loses 0, 1, 2, etc of the states. That represents the distribution of the number of states in which the favourite loses. It can be calculated directly, but it is also quite easy to simulate**. For example, here is OpenBUGS code to do it:
for (i in 1:51) # for each of the 50 states and Washington D.C.
# determine if the favourite loses
favloses[i] ~ dbern(p[i])
# p[i] is the probability that the favourite will lose that state
N <- sum(favloses) # add up the number of losses
In the code, N is the number of states in which the favourite loses. The values of p[i] are the predicted probabilities that the favourite loses, which I took from Nate Silver’s website. Rounded to two decimal places, and ordered alphabetically by the name of the state, these are:
list(p=c(0, 0, 0.02, 0, 0, 0.2, 0, 0, 0, 0.5, 0, 0, 0, 0, 0, 0.16, 0, 0, 0, 0, 0, 0, 0.01, 0, 0, 0, 0.02, 0, 0.07, 0.15, 0, 0.01, 0, 0.26, 0, 0.09, 0, 0, 0.01, 0, 0, 0, 0, 0, 0, 0, 0.21, 0, 0, 0.03, 0))
All of the zeroes are states in which the favourite is predicted to win with certainty (and the underdog is predicted to lose); sometimes the favourite is Obama, sometimes it is Romney, depending on which state it is.
From that code and data, we can simulate the probability distribution for the number of states in which the favourite is expected to lose. Here it is:
So, the probability that the favourite will win in all states is only ~13%. The most likely result is the favourite losing one state, and the loss of two states is also likely. In fact, Nate Silver’s predictions encompass the (unlikely) chance that the underdog would triumph in as many as five states.
Now I’m sure if Florida ends up falling Obama’s way, then the mystique surrounding Nate Silver will grow even larger. And if Florida falls to Romney, then some people will say “That Nate Silver is not so perfect after all!”. His crown might have appeared to slip even further if his predictions had failed in two or three states.
The irony, of course, is that the reverse should be true. Nate Silver’s predictions will be most strongly supported if the favourite loses one or two state. And it would be no surprise if the “favourite” in Florida (Obama) lost that state, where Nate Silver predicted the result was essentially a 50-50 contest. In fact, the outcome of “correctly” predicting the winner in 100% of contests is only the fourth most likely outcome under Silver’s model. “Incorrect” predictions in three states was predicted to be more likely
The potential fallibility of Silver’s predictions if he ended up predicting all states “correctly” was pointed out in a tweet by @Dr24hours: ‘Nate Silver calling 100% of the races correct means his p values are off. He said so himself in 2008. “Perfection” isn’t perfect.’ @Dr24hours’ tweet prompted me to do this analysis and write the post; I wondered if that were indeed true.
However, my analysis above suggests that while predicting all the states “correctly” might (weakly) suggest Nate Silver underestimates the probability that the favourite will win each state, we can’t really invalidate his predictions regardless of who ends up winning Florida. The result will be well within the bounds of probability that was calculated by Nate Silver on the morning of the election.
So, no, he is not a witch, but he does seem to be quite good at calculating probabilities rationally. That would seem to qualify him for the title “guru”, and someone to watch if you want to give yourself a chance of beating the bookies in the next US presidential election.
Edit: I saw a tweet from @skepticscience via @EdYong209 about the website “natesilverwrong.com”, which was taken down after 1 day because it seems “natesilverisright.com”. An archived version of the site is here: http://www.webcitation.org/6Bz2BnE8V – interestingly, the removal of the site was an outcome predicted in one of the comments. The most insightful quote on the site is from Nate Silver himself: “I’m also sure I’ll get too much credit if the prediction is right and too much blame if it is wrong.”
* In US presidential elections, the winner of each state receives a specified number of delegates in the electoral college. The candidate with the most delegates is elected president. OK, Maine and Nebraska are split up into two or three groups of delegates; I’m treating those states in their entirety here for the sake of simplicity.
** Edit 2: @Wikisteff pointed out that I have assumed independence of the outcomes, when in fact correlations are likely. Positive correlations in the victory of the favourite would increase the probability of picking all states correctly. However, we would expect positive correlations in the victory of one candidate; because the favourite differed from state to state, correlations in the victory of the favourite might be smaller. And estimating the degree of correlation would be difficult, so I won’t attempt it (and I wonder if Nate Silver accounts for these correlations).