Sampling a discrete distribution with a Markov chain

Suppose we want to do long-term planning for a non-competing strength athlete. Each week in the training program is given a state that indicates how heavy the workouts are. For instance, a week could be either light, medium, heavy or very heavy. We could create a training program with a repeating cycle such as:

\begin{equation*} \cdots \to \mathrm{light} \to \mathrm{medium} \to \mathrm{heavy} \to \mathrm{very\,heavy} \to \cdots \end{equation*}

But this approach might be boring for the athlete and too predictable. To remedy this, we want to add some randomness, but it has to be controlled.

We imbue each state with a probability, for instance:

\begin{align*} \mathrm{light}&: 20 \, \% \\ \mathrm{medium}&: 30 \, \% \\ \mathrm{heavy}&: 35 \, \% \\ \mathrm{very\,heavy}&: 15 \, \% \end{align*}

At this point we could do weighted sampling from the state space. But that could produce a sequence such as

\begin{equation*} \cdots \to \mathrm{light} \to \mathrm{light} \to \mathrm{very\,heavy} \to \cdots, \end{equation*}

which might be too much of a shock. We don’t mind going from heavy to light, but going the other way does not seem ideal. It would be better to build up to the heavy week.

Instead of drawing independent random samples, we define transition probabilities from each state to neighboring states. Then we can sample from a Markov chain, in which we are able to choose which state transitions are possible.

Mathematical problem

Assume we have \(n\) finite states labeled \(1, 2, \ldots, n\). We are given the structure of a Markov chain over the states, and desired stationary distribution \(\boldsymbol{\pi}\). How can we construct transition probabilities \(p_{ij}\), i.e. the probability of going from state \(i\) to state \(j\), such that \(\boldsymbol{\pi}\) is a stationary distribution?

Let’s focus on the loop structure, where we either stay in the current state with a certain probability, or move to the next state. The figure below shows such a Markov chain, where \(p_{11}\) is the probability of staying in state \(1\), \(p_{12}\) is the probability of transitioning from state \(1\) to state \(2\), and so forth.

Examining a concrete problem instance

Consider a problem instance shown above with \(n=4\) and \(\boldsymbol{\pi} = (1, 2, 3, 4)\) (scaling does not matter). Assume the loop structure shown in the figure above, where we can either stay in a state or move to the next state. How can we sample from this Markov chain?

We can’t use the Metropolis-Hastings algorithm, since it requires a proposal function that is reversible in the following sense: if \(p_{ij} > 0\), then \(p_{ji}\) cannot be \(0\). The loop structure does not have this property, so we abandon the idea of using Metropolis-Hastings for this problem.

Let \(P\) be a Markov matrix of size \(n \times n\). The entries in each row must sum to \(1\), and all entries must be non-negative. We seek a matrix P such that \(\boldsymbol{\pi} P = \boldsymbol{\pi}\), since if this holds \(\boldsymbol{\pi}\) is a stationary distribution.

Given the loop structure, the non-zero entries of \(P\) are

\begin{equation*} P = \begin{bmatrix} p_{11} & p_{12} &  &  \\  & p_{22} & p_{23}  &  \\  &  & p_{33} &  p_{34} \\ p_{41} &  &  & p_{44} \\ \end{bmatrix}. \end{equation*}

We can find the values of the entries in \(P\) by solving a set of linear equations. There are \(8\) unknowns. There are \(4\) linear equations for the row sums, e.g. \(p_{11} + p_{12} = 1\), \(p_{22} + p_{23} = 1\), and so forth. We also uncover \(4\) linear equations from \(\boldsymbol{\pi} P = \boldsymbol{\pi}\).

The system of linear equations we must solve is

\begin{equation*} \begin{bmatrix} 1 & 1 &  &  &  &  &  &  \\  &  & 1 & 1 &  &  &  &  \\  &  &  &  & 1 & 1 &  &  \\  &  &  &  &  &  & 1 &  1\\ 1 &  &  &  &  &  & 4 &  \\  & 1 & 2 &  &  &  &  &  \\  &  &  & 2 & 3 &  &  &  \\  &  &  &  &  & 3 &  & 4 \\ \end{bmatrix} \begin{bmatrix} p_{11} \\ p_{12} \\ p_{22} \\ p_{23} \\ p_{33} \\ p_{34} \\ p_{41} \\ p_{44} \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 2 \\ 3 \\ 4 \end{bmatrix}, \end{equation*}

with the additional constraint that every \(p_{ij} \geq 0\).

Solving and pulling the unknowns back into the matrix \(P\), we obtain the following solution. The nullspace of the matrix is one-dimensional, so the solution is not unique—a family of linear solutions parametrized by a scalar \(\alpha\) all solve the problem.

\begin{equation*} \begin{bmatrix} 0 & 1 &  &  \\  & 1/2 & 1/2  &  \\  &  & 2/3 &  1/3 \\ 1/4 &  &  & 3/4 \\ \end{bmatrix} + \alpha \begin{bmatrix} 1 & -1 &  &  \\  & 1/2 & -1/2  &  \\  &  & 1/3 &  -1/3 \\ -1/4 &  &  & 1/4 \\ \end{bmatrix} \end{equation*}

where \(\alpha \in [0, 1)\). After staring at this concrete problem for a while, and testing some other problem instances, I was able to uncover a general solution to the problem.

Transition probabilities for a discrete Markov chain, given a stationary distribution

Theorem. Given \(n\) states and a desired stationary distribution \(\boldsymbol{\pi}\), as well as a structure on the Markov chain that allows going \(k\) states ahead in a loop, the transition probabilities are

\begin{equation*} p_{ij} = \begin{cases}   \left( \pi_i - \sum_{\ell=1}^{k} m_\ell \right) / \pi_i  & \text{ if $i=j$} \\   \left( m_{i-j \bmod n} \right) / \pi_i & \text{ if $i \neq j$} \end{cases} \end{equation*}

where \(a \bmod b\) denotes the modulus operator and \(0 < \sum_{\ell=1}^{k} m_\ell < \min \left( \boldsymbol{\pi} \right)\). We define \(m_\ell = 0\) for every \(\ell > k\).

Proof. We must show that each row sums to unity, and that \(\boldsymbol{\pi} P = \boldsymbol{\pi}\) holds. Instead of showing the general case, we show that it holds for \(n=4\). This keeps the notation simple, and it’s easy to see that it holds for any \(n\).

When \(n=4\) and \(k=2\) the matrix becomes

\begin{equation*} P = \begin{bmatrix} \frac{\pi_1 - m_1 - m_2}{\pi_1} & \frac{m_1}{\pi_1} & \frac{m_2}{\pi_1} &  \\  & \frac{\pi_2 - m_1 - m_2}{\pi_2} & \frac{m_1}{\pi_2} & \frac{m_2}{\pi_2}  \\  \frac{m_2}{\pi_3} &  & \frac{\pi_3 - m_1 - m_2}{\pi_3} & \frac{m_1}{\pi_3} \\ \frac{m_1}{\pi_4} & \frac{m_2}{\pi_4} &  &  \frac{\pi_4 - m_1 - m_2}{\pi_4}\\ \end{bmatrix} \end{equation*}

Clearly each row sums to \(1\). For each column \(j\) we see that \(\sum_i P_{ij} \pi_i = \pi_i\) also holds.

Simulating draws from the looping Markov chain

Let’s go back to the example in the beginning, with \(n=4\) and \(\boldsymbol{\pi} = (20, 30, 35, 15)\). We allow jumping \(k=1\) steps ahead in the loop, and set \(m = 0.95 \min \left( \boldsymbol{\pi} \right)\).

def simulate_markov_loop(probabilities, m):
    """Draw samples from a Markov chain in the form of a loop,
    where state probabilities are given by `probabilities` and
    in each that we can either stay, or move to the next state."""
    state = 0
    yield state

    while True:

        # Whether or not we should move to the next state
        if random.random() < m / probabilities[state]:
            state = (state + 1) % len(probabilities)

        yield state


probabilities = [20, 30, 35, 15]
m = min(probabilities) * 0.95

for state in simulate_markov_loop(probabilities, m):
    print(state)

Transition probabilities for similar Markov chain structures

Generalizing a bit, here is a full \(5 \times 5\) matrix that “wraps around” the chain structure and produces a stationary distribution \(\boldsymbol{\pi}\). In the matrix below we set \(M = m_1 + m_2 + m_3 + m_4\).

\begin{equation*} \begin{bmatrix} \frac{\pi_1 - M}{\pi_1} & \frac{m_1}{\pi_1} & \frac{m_2}{\pi_1} & \frac{m_3}{\pi_1} & \frac{m_4}{\pi_1} \\ \frac{m_4}{\pi_2} & \frac{\pi_2 - M}{\pi_2}  & \frac{m_1}{\pi_2} & \frac{m_2}{\pi_2} & \frac{m_3}{\pi_2} \\ \frac{m_3}{\pi_3} & \frac{m_4}{\pi_3} & \frac{\pi_3 - M}{\pi_3}  & \frac{m_1}{\pi_3} & \frac{m_2}{\pi_3} \\ \frac{m_2}{\pi_4} & \frac{m_3}{\pi_4} & \frac{m_4}{\pi_4} & \frac{\pi_4 - M}{\pi_4}  & \frac{m_1}{\pi_4} \\ \frac{m_1}{\pi_5} & \frac{m_2}{\pi_5} & \frac{m_3}{\pi_5} & \frac{m_4}{\pi_5} & \frac{\pi_5 - M}{\pi_5}  \\ \end{bmatrix} \end{equation*}

We require that \(0 < M < \min \left( \boldsymbol{\pi} \right)\), so the diagonal entries are always non-negative. Some values of \(m_i\) can be set to zero, but not so many that states become unreachable.

The matrix above opens up for several interesting structures:

  • Jump one state forward. Set \(m_1 > 0\) and all other values of \(m_i\) to zero.
  • Jump one or two states forward. Set \(m_1 > 0\) and \(m_2 > 0\)  and all other values of \(m_i\) to zero.
  • Jump one state forward or back. Set \(m_1 > 0\) and \(m_4 > 0\)  and all other values of \(m_i\) to zero.

Other structures that do not loop (“wrap around”) are also possible. In this structure, we can either stay in state \(i\), or move to neighboring states. But unlike in the Markov chain above, we cannot loop around.

\begin{equation*} \begin{bmatrix} \frac{\pi_1 - m_1 - m_2}{\pi_1} & \frac{m_1}{\pi_1} & \frac{m_2}{\pi_1} &   &  \\ \frac{m_1}{\pi_2} & \frac{\pi_2- 2m_1 - m_2}{\pi_2}  & \frac{m_1}{\pi_2} & \frac{m_2}{\pi_2} &\\ \frac{m_2}{\pi_3} & \frac{m_1}{\pi_3} & \frac{\pi_3 - 2m_1 - 2m_2}{\pi_3}  & \frac{m_1}{\pi_3} & \frac{m_2}{\pi_3} \\   & \frac{m_2}{\pi_4} & \frac{m_1}{\pi_4} & \frac{\pi_4 - 2m_1 - m_2}{\pi_4}  & \frac{m_1}{\pi_4} \\   &   & \frac{m_2}{\pi_5} & \frac{m_1}{\pi_5} & \frac{\pi_5 - m_1 - m_2}{\pi_5}  \\ \end{bmatrix} \end{equation*}

Summary

Given the structure of a Markov chain over a finite state space and a desired stationary distribution \(\boldsymbol{\pi}\), we’ve seen how to construct transition probabilities \(p_{ij}\). This lets us sample from a Markov chain instead of drawing samples that are independent. One advantage of this approach is that we can assure that the next sample is always “near” the previous sample, which is what we want in the strength training application. The novelty of the approach here is that we can have non-reversible structures such as loops, which is not possible using the Metropolis-Hastings algorithm.

Code: An example implementation of these algorithms can be found in the GitHub gist markov_sampling.py.