**Need help with data science or mathematical modeling?**I do consulting work in Norway. Read about my previous work experience and reach out to me for more information.

# Betting on football - results

- 3. December 2023
- #datascience

Throughout 2023 I’ve been using math to make bets on football matches. I never thought I would make much money, but I wanted to see how my theoretical efforts would hold up in practice.

Read on to learn the details of how I lost \(93 \,\%\) of my money betting on \(200\) football matches.

## The statistical model

The statistical model answers the question:

What is the probability of each outcome of a football match?

Here’s the story behind the statistical model:

- In 2020 I created a Bayesian model for football in the probabilistic programming language Stan. This was a fun project and I learned more about practical Bayesian statistics.
- I also ran the model on the Norwegian 2021–22 Football Cup and the 2022 Elite Series.
- I upgraded the model slightly to use a Bivariate Poisson distribution.
- The data set only contains previous match results. The model uses no information about players, matches outside of the Elite series, or anything else.

The model samples \(64\,000\) parameter values from the posterior distribution in just under \(2\) minutes on my computer. To compute probabilities I simulated \(10\,000\) matches, and this also takes around \(2\) minutes.

The model outputs a probability for each outcome. As shown in the figure below, this can be used to determine the probability of a home win, a draw or an away win.

Overall the model is quite good, in the sense that the output probabilities correlate fairly well with the odds. The figure below illustrates this. The probabilities obey \(p_i \geq 0\) and \(\sum_i p_i = 1\), so they can be plotted on a simplex.

The blue star is the probability vector deduced from the odds. The orange star is the probability vector obtained from the statistical model. Given a discrepancy between the two and a belief that the model is more correct than the odds, an opportunity to make money presents itself—and the optimization model will exploit this. The green dots in the background represent model uncertainty about parameters.

## The optimization model

The optimization model answers the question:

How much money should we allocate to each bet to maximize long term wealth grow?

Here’s the story behind the optimization model:

- Based on probabilities from the statistical model and game odds, I created a betting model that maximizes long-term wealth. The model is based on a generalization of the Kelly criterion from the 1950s, which is a formula for sizing a bet.
- The output of the optimization model is a fractional allocation of total money to each bet.
- The problem was solved using the Splitting Conic Solver and implemented in the convex optimization modeling suite CVXPY, and it typically takes around one minute to solve to optimality.

The figure below shows the growth rate distribution after placing bets in a round, conditioned on (1) the probabilities from the statistical being correct and (2) using the optimal allocation of money to each individual bet. It’s quite likely that we lose money in a single round or on a single bet, but over time we hope to make money. It’s impossible to earn money without risk, and the optimization model balances risk and reward for optimal long-term growth.

In total I wrote around \(5000\) lines of Python code to make this project work.

## The process

- Before each of the \(30\) rounds in the Elite Series (\(8\) matches per round), I had to manually input odds prior to the game and input the bets once the models were run.
- The statistical model was initially trained on results from the 2022 Norwegian Elite Series. Before each round, I re-trained the model on 2022 results and 2023 results up until the current round.
- There are \(30 \times 8 = 240\) matches in total. I evaluated bets on \(200\) out of these, and placed bets on \(142\) matches. On \(96\) matches the model hedged its bets by placing money on more than one outcome. I missed some betting opportunities due to forgetfulness and vacations.
- I started with \(5\,000\) NOK (around \(500\) USD). In total around \(22\,000\) NOK was bet over the course of a year.
- On a typical round I bet \(60\,\%\) of my total wealth, spread across \(8\) matches in the round. The minimum was \(19\,\%\) and the maximum was \(88\,\%\).

## The results

I lost \(93 \,\%\) of my money, going from \(5\,000\) NOK to \(348\) NOK. The figure below shows how the money decreased over time. Notice that the vertical axis is on the log scale.

In the figure above, we plot \(100\) Monte Carlo simulations in the background. In each simulation, half the total wealth is spread across all matches in the round, and in each match the outcome is chosen uniformly at random. The Monte Carlo trajectories show that placing random bets might’ve been a better strategy. However, it’s hard to know for sure—with some luck the last bets could have played out, and the outcome would have looked different.

**Why did it the system perform so poorly?**
I have some thoughts:

- The odds were literally stacked against me from the beginning. The odds broker has a built-in cut of around \(7\,\%\). If we convert the odds to probabilities and add them up, they don’t add to \(100\,\%\), but \(107\,\%\). To make money, I have to beat the masses
*and*we have to beat the broker. - I cannot be sure that the code is free of bugs. Systems like these are hard to test, and I coded it up in a few weekends with no peer review. The model placed \(8 \, \%\) of the total money bet over the season on home wins, \(42 \, \%\) on draws and \(50 \, \%\) on away wins. This is uncomfortable — the discrepancy between the odds and the statistical model in predicting home wins might indicate a bug or a modeling error.
- The statistical model is structurally simple, and only uses result information from past games. No expert opinions, no information about players, no information about more detailed game statistics (passes, opportunities, ball-time, etc).
- The optimization model is based on a limiting theorem, maximizing theoretical long-term growth over an infinite sequence of matches. This is a great mathematical idea and it likely works well in practice, but in reality we are not betting on an infinite number of matches.

In the end it was a fun project and I learned a lot. Betting is unlikely to pay the bills, but statistics and optimization is worth knowing a thing or two about regardless.