My wife is a biostatistician at an international company whose headquarters is located in England. As a result of this circumstance, many of the people she calls colleagues are heavily interested in both statistical analysis and major European football tournaments. She was invited to participate in a prediction league for this summer's edition of the Euros, and since she knows this kind of thing is right up my alley, she invited me to join as well.

Now, being professional statisticians, they do not put together prediction leagues in the simple, fill-out-your-bracket style that we American simpletons tend to do. Their league was based on Bayesian modeling.

For each of the 36 group stage matches, participants were required to provide the probabilities (adding up to 1.0, of course) that the home team would win, that the away team would win, and that there would be a draw. A baseline prediction would be for a .33 chance of each outcome; your model should be at least better than that in order to be useful and have a chance at winning the tournament.

I could look up the precise formula used to determine each participant's game-by-game score, but honestly it matters as much as knowing exactly how to calculate a QB rating or a baseball player's WAR. It's enough to know that a score of "0" means that your model is working as well as random chance. (Also, a surefire way to eliminate yourself would be to predict an outcome had literally 0 chance of happening, because when that outcome actually does come to pass your score becomes negative infinity, with no chance of recovery. This happened to two people within the first three matches.)

Anyway, the scoring system was complex and the game-by-game changes in position, with 60+ participants, were difficult to track. So, I created this chart to show how participants' models were performing, both in a raw sense and in a rank-order bump chart. (Click here for the interactive version; see the tabs at the top of the viz to switch between the two charts).

My model used the current ELO scores of all participating teams, and then I took a wild guess at how often teams would draw (ELO doesn't predict chance of a draw, only chance of winning based on the contestants' scores). I assumed that teams would be risk-averse, so I predicted a high chance of draws, especially between teams that were closely matched.

This did not turn out to be the case.

I'm embarrassed to say that I finished in the bottom half, and my model ended up being worse than randomly guessing. For the World Cup, I will revise my model to predict fewer draws, and to take other factors into account than the ones I chose. Still, it was quite enjoyable, although humbling.