Optimal pupil-classroom assignments

A question from Stack Overflow roughly reads as follows:

Every year teachers organize classes for the following year. Pupils get to select a number of friends they would like to be with. School policy is that each pupil should be with at least one friend – obviously the more preferred that friend is, the better. Given pupils with preferences of classmates, how can pupils be optimally assigned into classes?

This is an interesting problem. Solving it would liberate teachers from boring manual work. Good solutions might make pupils more happy with their classrooms and maybe even their education. In this article we show that the pupil-classroom assignment problem is easily solved in practice.

We first demonstrate that the problem can be formulated as a Mixed Integer Program (MIP). Then we present an effective stochastic optimization approach using simulated-annealing. Finally we show that simulated annealing is able to easily find the optimal solution on a fictional problem instance.

The figure above shows how stochastic optimization works to find a solution to a problem with \(100\) pupils and \(5\) classrooms. Green dots are pupils. The darker the color, the happier the pupils are with their assignment.

Table of contents

Translating the problem to mathematical notation

Binary decision variables. Before we can get to work on the problem, we have to formalize it. To start, we define a matrix \(X\) consisting of binary decision variables \(x_{ic}\) as

\begin{align} x_{ic}= \begin{cases} 1 \quad \text{ if pupil } i \text{ is assigned to classroom } c\\ 0 \quad \text{ otherwise}. \end{cases} \end{align}

The binary matrix \(X\) has one row per pupil and one column per classroom.

Pupil preferences. We assume that pupil preferences are given in a matrix \(P\) with entries \(p_{ij}\). The definition is

\begin{align} p_{ij} = \text{ the preference pupil } i \text{ has for pupil } j. \end{align}

A high preference \(p_{ij}\) will encourage the optimization model to group pupils \(i\) and \(j\) together. The preference entries in the matrix need not be symmetric: in general \(p_{ij} \neq p_{ji}\). However, in practice the preferences of both pupils \(i\) and \(j\) are realized when \(i\) and \(j\) are in the same classroom. The definition of \(p_{ij}\) allows for an arbitrary number of preferences, weighted preferences and negative preferences.

  • If pupils provide three preferences, one approach is to set each preference to \(1\).
  • Weighted preferences could mean mapping a first choice to \(3\), a second choice to \(2\) and a third choice to \(1\).
  • Negative preferences signify that pupils should not be assigned to the same classroom.

Pupil utilities. We define the utility \(U_i\) of pupil \(i\) as the sum of the preferences in favor of \(i\) in an assignment. Formally, the utility score for pupil \(i\) becomes:

\begin{align} U_i = \sum_c x_{ic} \sum_j x_{jc} p_{ij} = \sum_c \sum_j x_{ic} x_{jc} p_{ij} \end{align}

The product \(x_{ic} x_{jc}\) equals \(1\) if and only if pupils \(i\) and \(j\) are assigned to the same class. For each pupil \(i\), the attained preference utility \(p_{ij}\) is added to \(U_i\).

A Mixed Integer Program formulation

We now present a MIP formulation of the pupil-classroom assignment problem.

Hard constraints

To express “every pupil must be assigned to one class” we add the constraints:

\begin{align} \sum_c x_{ic} = 1 \qquad \forall \, i \end{align}

Similarly, we express constraints on the classroom sizes as:

\begin{align} \text{min class size} \leq \sum_i x_{ic} \leq \text{max class size} \qquad \forall \, c \end{align}

Linearizing the pupil utility function

The utility score \(U_i\) for pupil \(i\) is the sum of the pupil’s preferences:

\begin{align} U_i = \sum_c \sum_j x_{ic} x_{jc} p_{ij} \end{align}

There’s an issue with this formulation: the product \(x_{ic} x_{jc}\) is quadratic in the decision variables. Quadratic functions are not allowed in the MIP framework; they must be linearized. Products of binary decision variables are linearized by introducing new binary variables \(y_{ijc}\) and the constraints

\begin{align} y_{ijc} \leq x_{ic} \qquad y_{ijc} \leq x_{jc} \qquad x_{ic} + x_{jc} \leq y_{ijc} + 1 \qquad \forall \, (i, j, c) \end{align}

Having linearized the problem, we are one step closer to giving it to a general purpose MIP solver. Unfortunately, linearization introduces \(\mathcal{O} \left( |I|^2 |C| \right)\) binary variables – this makes large problem instances intractable in practice. A solver would still do well on small and medium-sized problems, and on large problems a time limit could be set instead of running the solver to completion. The solver would then return the best solution found within the time limit.

Choosing an objective function

The original problem statement problem asks:

School policy is that each pupil should be with at least one friend – obviously the more preferred that friend is, the better.

Policy-makers should not make guarantees like these, since it might be impossible to create an assignment where every pupil is with at least one friend.

As an example, consider the case of \(|I|\) pupils where each pupil \(i\) has \(K\) preferences and prefers pupils \(\{ (i + k) \, \% \, |I| + 1 \mid k = 0, 1, \ldots, K\}\). The preferences are cyclic and no feasible assignment exists (unless there’s only one classroom, or \(K\) is very large). The preference matrix \(P\) becomes circulant. Such a matrix is shown in the figure below.

Let’s go back to the issue at hand of making pupils happy in practice. We have many options for how we can use the utilities \(U_i\) to produce good assignments. The following approaches can all be linearized and can therefore be formulated in the MIP framework:

  • Maximize the minimum utility, i.e., \(\text{maximize} \, \min_i U_i\). This focuses on making the least happy pupil as happy as possible, but once the objective is accomplished the model does little for overall utility.
  • Maximize the sum of utilities, i.e., \(\text{maximize} \, \sum_i U_i\). This focuses on making the pupils happy in sum, but some individual pupils might end up unhappy.
  • Maximize a weighted sum of both of the objectives above, but with a small weight \(\epsilon\) on the sum of utilities. The model will focus on making the least happy pupil as happy as possible. Once that objective is achieved, it focuses on overall utility. In other words, we solve the problem: \(\text{maximize} \, \min_i U_i + \epsilon \sum_i U_i\).
  • A two-stage process. First we solve the problem \(\text{maximize} \, f(X) = \min_i U_i\) and obtain a solution \(X^\star\). Then we solve \(\text{maximize} \, \sum_i U_i\) with the constraint \(\min_i U_i \geq f(X^\star)\).

Below we choose the third option: maximizing minimum utility and the sum of the utilities, but with a small weight \(\epsilon\) on the sum of utilities.

The full Mixed Integer Program

We’re now in a position to present a full MIP for the problem. The complete model is:

\begin{align} & \text{maximize} && U_{\text{min}} + \epsilon \sum_i U_i \\ & \text{subject to} && \sum_c x_{ic} = 1 && \forall \, i \\ & && \sum_i x_{ic} \leq \text{max class size} && \forall \, c \\ & && \sum_i x_{ic} \geq \text{min class size} && \forall \, c \\ & && U_i = \sum_c \sum_j y_{ijc} p_{ij} && \forall \, i \\ & && y_{ijc} \leq x_{ic} && \forall \, (i, j, c) \\ & && y_{ijc} \leq x_{jc} && \forall \, (i, j, c) \\ & && x_{ic} + x_{jc} \leq y_{ijc} + 1 && \forall \, (i, j, c) \\ & && U_{\text{min}} \leq U_i && \forall \, i \\ & && x_{ic} \in \{0, 1\} && \forall \, (i, c) \\ \end{align}

A MIP solver would make quick work of small problem instances. It will likely struggle to solve the problem to optimality if there are hundreds of pupils, due to the amount of binary decision variables.

Stochastic optimization

We’ll sketch how stochastic optimization can be used to find high-quality solutions to the pupil-classroom assignment problem quickly, without needing a MIP solver. Simulated annealing works well on integer problems that are under-constrained, such as this one.

In the following we assume that class sizes are predetermined and constant. This justifies the use of a single, simple neighborhood operator that switches the assignment of two pupils at random. If class sizes were not fixed, an additional operator that moves a single pupil could be added as well.

Since simulated annealing will evaluate many assignments, we need a fast evaluation of the objective function. The utilities \(U_i\) can be computed naively as a matrix product, since:

\begin{align} U_i = \sum_c \sum_j x_{ic} x_{jc} p_{ij} = \left( X X^{\operatorname{T}} P^{\operatorname{T}} \right)_{ii} = \operatorname{diag} \left( X X^{\operatorname{T}} P^{\operatorname{T}} \right)_{i}. \end{align}

Matrix products are fast in general, so this computation would work reasonably well. However, we can speed up computations further by observing that:

  • The binary, symmetric matrix \(X X^{\operatorname{T}}\) has entry \((i,j)\) equal to \(1\) if and only if pupils \(i\) and \(j\) are in the same classroom. There’s no need to explicitly compute \(X X^{\operatorname{T}}\) as a matrix product to obtain this information.
  • The full matrix \(X X^{\operatorname{T}} P^{\operatorname{T}}\) does not need to be formed, since we’re only interested in the diagonal entries.

Applying these tricks speeds up the computation by one or two orders of magnitude on my computer – not too bad!

We use a simple simulated annealing algorithm:

1. Create a random initial assignment. Set an iteration counter to zero.
2. Swap two random pupils.
3. Evaluate the objective function.
4. If the new solution is better or the temperature condition kicks in,
   keep the new solution.
5. If many iterations are performed, return.
   If not, increase the iteration counter and go to step 2.

Results on fictional data

We create a problem instance with \(100\) pupils and \(5\) classrooms. We structure the problem so that it has an optimal solution, and we know beforehand what the optimal solution will be. The figure below shows a preference matrix for a problem with a known solution. The optimal solution is to group the first \(20\) pupils in one class, the following \(20\) pupils in another class, and so forth. Of course, the algorithm does not have this knowledge and starts with a random assignment.

We let simulated annealing run for one hundred thousand iterations. In the objective function, we set \(\epsilon = 1/300\). The sudden jumps in the objective function in the figure below happen when the least happy pupil becomes more happy.

The figure below shows how the solution matrix \(X\) changes as the algorithm progresses, and should be read from left to right. Recall that \(x_{ic} =1\) if and only if pupil \(1\) is assigned to classroom \(c\). The algorithm finds the optimal assignment with relative ease.

Additional real-life objectives

The original question specifies some additional objectives, which we now address.

Class sizes should be as similar as possible”

Option 1. The total number of pupils in class \(c\) is \(\sum_i x_{ic}\), and hard constraints can be used to enforce a predetermined range of allowed sizes.

\begin{align} \text{min class size} \leq \sum_i x_{ic} \leq \text{max class size} \qquad \forall \, c \end{align}

Option 2. Define \(S_c = \sum_i x_{ic}\) as the size of classroom \(c\). Instead of explicit, hard constraints we can add the following to the objective function to minimize the difference between the largest and smallest classroom in a MIP:

\begin{align} \text{minimize} \, \max_j S_c - \min_j S_c \end{align}

Option 3. We can add the variance of the classroom sizes, i.e., \(\operatorname{Var} \left( S_1, S_2, \ldots, S_c, \ldots \right)\), to the objective function in simulated annealing. Variance is not a linear function, but it need not be linear if we use simulated annealing instead of a MIP.

Genders should be reasonably evenly distributed”

Option 1. Let’s set \(g_i = 1\) if and only if pupil \(i\) is a girl. The total number of girls in class \(c\) is \(\sum_i x_{ic} g_i\), and the total number of pupils in class \(j\) is \(\sum_i x_{ic}\). To force the percentage of girls to be between \(40 \, \%\) and \(60 \, \%\) we can add hard constraints. Beware that the constraints must be chosen so a feasible solution exists. Hard constraints could look like:

\begin{align} 0.4 \sum_i x_{ic} \leq \sum_i x_{ic} g_i \leq 0.6 \sum_i x_{ic} \qquad \forall \, c \end{align}

Option 2. We can compute the girl-to-total-pupil ratios for each class, and minimize the variance of the ratios. Again, this goes into the objective function in simulated annealing.

How would any algorithm be modified to include ‘cannot be with pupil \(i\)?’“

If pupils \(i\) and \(j\) cannot be in the same class, set either one of \(p_{ij}\) or \(p_{ji}\) to a large negative value, for instance \(-99\). An explicit constraint can also be used to model “pupils \(i\) and \(j\) should never both be in the same class”. The constraints would be:

\begin{align} x_{ic} + x_{jc} \leq 1 \qquad \forall \, c \end{align}


The pupil-classroom assignment is probably solved by hand by almost every school in the world, every single year. Problem instances of moderate size are easily solved as a MIP. Larger instances, or variations where objectives cannot easily be linearized, are easily solved by stochastic optimization using for instance simulated annealing.

The stochastic optimization can account for many real-world preferences, such as:

  • Pupil preferences \(f_{1}(X)\)
  • Variable classroom size \(f_{2}(X)\)
  • Gender distribution \(f_{3}(X)\)
  • Distribution of grades \(f_{4}(X)\)

These objectives can be scalarized as a weighted sum

\begin{align} f(X) = \sum_k w_k f_k (X) \qquad \sum_k w_k = 1. \end{align}

Alternatively, algorithms for multi-objective optimization can be used to produce a set of solutions and a teacher can choose trade-offs between the objectives post-hoc.