How to deposit money when banks fail

Assume that we have a set of \(n\) banks, where each bank \(i\) has a failure probability \(f_i\). If a bank fails, all the money deposited into that bank is lost. We wish to distribute our money over the banks in such a way that we are well-protected against bank failures. How can we do this?

We’ll assume that we’re allocating a single unit of money (alternatively, that we’re allocating proportions of our total net worth). The allocation can be described by a vector \(\boldsymbol{w}\) of weights. Each \(w_i\) is in the range \([0, 1]\) and we impose the constraint \(\sum_i w_i = 1\). We also assume that bank failures are independent.

Maximizing expected remaining money

Let’s maximize the expected remaining money we’ll end up with over all possible outcomes. Each of the \(n\) banks will either fail or not, so there are \(2^n\) outcomes in total.

First we will examine a case with \(n=3\) banks more closely. As an example, consider the outcome where bank \(1\) fails, but banks \(2\) and \(3\) do not fail. The probability of this outcome is \(f_1(1- f_2) (1-f_3)\) and if this happens, we’re left with \(w_2 + w_3\) units of money.

Let \(R(\boldsymbol{w} \mid \boldsymbol{f})\) be the distribution of remaining money. The expected remaining units of money we’re left with, over all possible outcomes, is given by:

\begin{align*} \mathbb{E} \left[ R(\boldsymbol{w} \mid \boldsymbol{f}) \right] = && f_1 &&f_2 &&f_3 &&(0) &&+ \\ && f_1 &&f_2 &&(1-f_3) &&(w_3) &&+ \\ && f_1 &&(1-f_2) &&f_3 &&(w_2) &&+ \\ && f_1 &&(1-f_2) &&(1-f_3) &&(w_2 + w_3) &&+ \\ && (1-f_1) &&f_2 &&f_3 &&(w_1) &&+ \\ && (1-f_1) &&f_2 &&(1-f_3) &&(w_1 + w_3) &&+ \\ && (1-f_1) &&(1-f_2) &&f_3 &&(w_1 + w_2) &&+\\ && (1-f_1) &&(1-f_2) &&(1-f_3) &&(w_1 + w_2 + w_3)  && \end{align*}

This function has \(2^n\) terms, but if we collect factors for each \(w_i\), we discover that the function can be expressed using only \(n\) terms with a simple linear structure:

\begin{align*} \mathbb{E} \left[ R(\boldsymbol{w} \mid \boldsymbol{f}) \right] = \sum_i (1-f_i) w_i \end{align*}

To see why this is so, consider for instance collecting the factors associated with \(w_1\). If we collect terms, we end up with the expression

\begin{align*} (1-f_1) \left[ f_2 f_3 + (1-f_2) f_3 + f_2 (1-f_3) + (1-f_2) (1-f_3) \right]. \end{align*}

The factor in the square brackets in this expression is a sum over the distribution of outcomes for every bank except bank \(1\), and must equal \(1\) because it’s a probability distribution.

In summary, to maximize \(\mathbb{E} \left[ R(\boldsymbol{w} \mid \boldsymbol{f}) \right]\) we simply put all the money in the bank with the lowest failure rate \(f_i\). When we collected terms and reduced the function from \(2^n\) to \(n\) terms, the solution became easy to discover.

The Markowitz model

The Markowitz idea is to maximize the expected value while minimizing the variance. We’ve already seen that the expected remaining money is given by \(\sum_i (1-f_i) w_i\). In the calculation above we looked at \(2^n\) outcomes and collected terms, but there’s a much faster way to obtain the same result.

Let \(R_i\) denote the money remaining in bank \(i\). Since the bank can fail or survive, \(R_i\) is a Bernoulli (yes-or-no) variable. It’s easy to see that

\begin{align*} \mathbb{E} \left[ R_i \right] &= f_i \cdot 0 + (1-f_i) \cdot w_i = (1-f_i) w_i \\ \operatorname{Var} \left[ R_i \right] &= f_i \cdot (0 - \mathbb{E} \left[ R_i \right]) + (1-f_i) \cdot (w_i - \mathbb{E} \left[ R_i \right]) = f_i (1-f_i) w_i^2 \end{align*}

The total remaining money is given by \(R(\boldsymbol{w} \mid \boldsymbol{f}) = \sum_i R_i\) and since each \(R_i\) is independent we have

\begin{align*} \mathbb{E} \left[ R(\boldsymbol{w} \mid \boldsymbol{f}) \right] &= \mathbb{E} \left[ \sum_i R_i \right] =  \sum_i \mathbb{E} \left[ R_i \right] = \sum_i (1-f_i) w_i \\ \operatorname{Var} \left[ R(\boldsymbol{w} \mid \boldsymbol{f}) \right] &= \operatorname{Var} \left[ \sum_i R_i \right] =  \sum_i \operatorname{Var} \left[ R_i \right] = \sum_i f_i (1-f_i) w_i^2 \end{align*}

Let’s summarize: given failure probabilities \(\boldsymbol{f}\), we are free to choose a weight vector \(\boldsymbol{w}\). Such a vector gives us a probability distribution \(R(\boldsymbol{w} \mid \boldsymbol{f})\), with an associated expected value and variance. This is shown in the figure below. Notice how one choice of \(\boldsymbol{w}\) gives higher expected value, but also higher variance.

This leads us to the Markowitz model. We’re given \(\boldsymbol{f}\), then we choose \(\alpha\) to control our risk tolerance and solve the following Quadratic Program (QP) for the optimal \(\boldsymbol{w}\).

\begin{align} & \text{maximize}   && \alpha \underbrace{\sum_i (1-f_i) w_i }_{\mathbb{E} \left[ R(\boldsymbol{w} \mid \boldsymbol{f}) \right]} - (1-\alpha) \underbrace{\sum_i f_i (1-f_i) w_i^2  }_{\operatorname{Var} \left[ R(\boldsymbol{w} \mid \boldsymbol{f}) \right]}    \\ & \text{subject to} && \sum_i w_i  = 1    &&  \\ & && w_i \in \mathbb{R}_+ && \end{align}

For instance, given \(\boldsymbol{f} = (0.1, 0.11, 0.12)\) and choosing \(\alpha = 1/2\), we obtain the solution \(\boldsymbol{w}^{\star} \approx (0.413, 0.329, 0.258)\). This objective function, as well as the optimal solution, is plotted on a simplex in the figure below. Varying \(\alpha\) in the range \([0, 1]\), we obtain a family of solutions. These solutions are shown in the figure below, on the simplex and as components of the solution vector \(\boldsymbol{w}^{\star}\). When \(\alpha\) goes to zero we become risk-averse and regularize the solution towards the center of the simplex. QPs are easy to solve. Solving a problem with \(n=1000\) banks takes one second if we count executing the Python code. If we do not count the Python code, the actually numerical solver (OSQP, which is written in C) makes quick work of a problem with \(n=1000\) in milliseconds.

Maximizing expected remaining utils

Consider now a slightly different objective function. Let’s maximize the expected logarithm of the remaining money we’ll end up with. The logarithm of money represents the utils, and the concavity captures the idea that going from \(0\) to \(1\) units of money represents more value than going from \(100\) to \(101\) units of money.

Instead of assigning weight \(w\) to an outcome, we’ll assign weight \(\log(\epsilon + w)\) to it. This introduces a concave utility function of money, with \(w=0\) mapping to zero when \(\epsilon = 1\).

Let \(\{0, 1\}^n\) be the set of all zero-one vectors with \(n\) elements. Each such vector maps to a specific outcome (failures and non-failures of banks). Let \(\boldsymbol{c} \in \{0, 1\}^n\) be such an outcome. For instance, by \(\boldsymbol{c} = (0, 1, 0)\) we mean that the first and third banks failed.

Given an outcome \(\boldsymbol{c}\), the probability of the outcome is given by the function \(p: \{0, 1\}^n \to [0, 1]\) and the remaining money is given by \(g : \{0, 1\}^n \to [0, 1]\). The functions are given by

\begin{align*} p(\boldsymbol{c}) &= \prod_{i=1}^{n} f_i^{1-c_i} (1-f_i)^{c_i} \\ g(\boldsymbol{c}) &= \sum_{i=1}^{n} c_i w_i \end{align*}

The logarithm of the expected remaining money we’ll end up with is given by:

\begin{equation*} U(\boldsymbol{w} \mid \boldsymbol{f})=\sum_{\boldsymbol{c} \in \{0, 1\}^n } \prod_{i=1}^{n} f_i^{1-c_i} (1-f_i)^{c_i} \log \left ( \epsilon + \sum_{i=1}^{n} c_i w_i \right ) \end{equation*}

Our task now is to maximize this function, subject to \(\sum_i w_i = 1\) and each \(w_i \geq 0\).


There are efficient algorithms for maximizing concave functions, which is equivalent to minimizing a convex function. Since \(\log ( \epsilon + \sum_{i=1}^{n} c_i w_i )\) is concave and a weighted sum of concave functions is concave, the function \(U(\boldsymbol{w} \mid \boldsymbol{f})\) is concave.

Given a vector of failures, for instance \(\boldsymbol{f} = (0.10, 0.11, 0.12)\), we can solve the problem efficiently. We set \(\epsilon = 1\). I used CVXPY v1.1.17 and ECOS 2.0.7, and the problem was compiled and solved in a few milliseconds.

The solution is \(\boldsymbol{w}^{\star} \approx (0.498, 0.332, 0.169)\), and is shown in the figure below. Varying \(\epsilon\) in the range \([10^{-3/2}, 10^{3/2}]\), we obtain a family of solutions. These solutions are shown in the figure below, on the simplex and as components of the solution vector \(\boldsymbol{w}^{\star}\). Notice that while not identical, the effect of decreasing \(\epsilon\) has the same regularizing effect as decreasing \(\alpha\) in the Markowitz problem.

Approximations for large \(n\)

The function \(U(\boldsymbol{w} \mid \boldsymbol{f})\) has \(2^n\) terms. If \(n=14\) it takes close to a minute to solve the problem, as measured at the highest level in Python (the numerical work is done more quickly).

We can improve this by dropping small terms in the function. Assume that \(f_i \ll 1\) and let \(\epsilon\) be a small number in the same order of magnitude as \(f_i\). There is one constant term corresponding to zero failures. There are \(\binom{n}{1} = n\) terms corresponding to one failure, each of order \(\mathcal{O}(\epsilon)\). There are \(\binom{n}{2} = n(n+1)/2\) terms corresponding to two failures, each of order of order \(\mathcal{O}(\epsilon^2)\).

Let \(U(\boldsymbol{w} \mid \boldsymbol{f})_k\) be the function where terms corresponding to \(k\) or more failures are dropped. The difference between the full function and its truncation can be bounded by \(U(\boldsymbol{w} \mid \boldsymbol{f}) - U(\boldsymbol{w} \mid \boldsymbol{f})_k = \mathcal{O}(\epsilon^k)\). Although we haven’t bounded the difference between the solutions, we can observe that with a few terms the solution appears stable. In the figure below, I set \(\boldsymbol{f}\) as such:

failures = [0.1 + i**0.5 / 100 for i in range(14)]

The figure below shows the error, which decreases the more terms we include. When all terms are included the function has \(2^{14} = 16384\) terms, and no error. Solving with \(\binom{14}{0} + \binom{14}{1} + \binom{14}{2} = 1 + 14 + 91 = 106\) terms took less than a second. As shown in the figure, the solutions obtained are very similar.


Maximizing the expected remaining money is equivalent to maximizing a linear function \(\sum_i (1-f_i) w_i\) subject to \(\sum_i w_i = 1\) and \(w_i \geq 0\) for every bank \(i\). Even though there are \(2^n\) distinct outcomes, the objective only has \(n\) terms. This is a trivial problem.

Balancing our greed with risk reduction leads to the Markowitz model. We maximize the expected value while minimizing the variance, balancing the two objectives with a factor \(\alpha\). The associated optimization problem turned out to be a Quadratic Program, which is easy to solve in practice. We saw that \(n\) terms were sufficient to describe the variance.

Finally, we maximized the logarithm of the remaining money. Changing \(\epsilon\) in \(\log(\epsilon + w)\) had the same effect as changing \(\alpha\) in the Markowitz model—lower values correspond to higher risk aversion. We were unable to get exact results without one term per outcome. The \(2^n\) terms would quickly render the problem intractable, so we investigated an approximation scheme.


  • Chapter 3 in the book “Convex Optimization” by Stephen Boyd and Lieven Vandenberghe explains how to determine if a function is convex, and which operations that preserve convexity. In practice, CVXPY will let you know if your function is convex or not. Software like CVXPY would not be able to determine that the expected value and variance can be described by \(n\) terms instead of \(2^n\), so pen and paper is still useful.