How do Voting Advice Applications work?

This is a translation of my Norwegian article about VAAs. The figures have not been translated, but are hopefully understandable from the context.


This article presents the mathematics behind voting advice applications (VAAs). VAAs are essentially questionnaires: a user expresses agreement or disagreement with statements about politics, and the VAAs tells the user what parties they align with.

This article consists of three main parts:

  • Part 1: VAAs and geometry
  • Part 2: How does a VAA explain the results?
  • Part 3: How should a VAA present questions to the user?

The article begins at a basic mathematical level and becomes progressively more challenging. The advantage of this approach is that there is something to learn for everyone regardless of level. The disadvantage is that the article does not have a single clear target audience.

PS: In 2023, I visualized the parties in the Norwegian municipal election. The focus of that article was insight into party politics. This article focuses more on the design of VAAs.

Part 1: VAAs and geometry

Opinions and matrices

To create a VAA, we ask political parties to which degree they agree with various statements. The results are quantified by translating responses like “Strongly agree” into numbers. The answers can then be visualized in a table with one row per statement and one column per party. Below, “Strongly agree” is translated to \(2\) and “Strongly disagree” to \(-2\).

The table forms a matrix \(X\). The rows and columns have a geometric interpretation as vectors. A vector is a list of numbers, e.g., \((1, 2)\), and can be interpreted geometrically as an arrow or as a point.

Statements and parties

Statements as vectors

To visualize statements and parties as vectors, we must limit ourselves to two or three dimensions. Higher dimensions cannot be visualized.

Let’s study geometrically what the parties SV and Sp think about a selection of statements.

In the upper right of the figure, we find statements that both parties agree with. In the lower right, we find statements that SV agrees with, but Sp disagrees with. We can make similar observations for the other two quadrants.

Which statements are good and which are bad?

A good statement allows us to distinguish parties from each other:

  • In the figure above, “Build railway northward to Tromsø” (Norwegian: “Bygg jernbane nordover til Tromsø.”) is a poor statement. Regardless of whether you agree or disagree, the statement does not clarify which of the two parties you are politically closest to. Trivial statements, such as “A just society is good,” are agreed upon by all parties. Therefore, they provide no information. The same applies to trivial statements that all parties disagree with, such as “The county municipality should be abolished” (Norwegian: “Fylkeskommunen bør avskaffes.”) in the figure above.
  • Drop the ferry-free E39 along the west coast” (Norwegian: “Dropp ferjefri E39 langs Vestlandet.”) is a good statement because it discriminates between the parties. If you agree, you are politically closer to SV. If you disagree, you are closer to Sp.

In two dimensions (two parties), statements are good if they are far from the subspace spanned by the vector \(\boldsymbol{e} = (1, 1)\), i.e., the diagonal line \(y=x\). At the end of this article, we will generalize this idea to construct an algorithm that finds a good ordering of questions.

Parties as vectors

Above, we considered the rows (statements) in the table as vectors. Let’s now look at the complementary vector space where the columns (parties) are vectors. We choose two random statements: one about wolves (“Det skytes for mye ulv her i landet.” / “Too many wolves are shot in this country.”) and one about state ownership in companies (“Staten bør eie flere bedrifter.” / “The state should own more companies.”).

Two parties are similar if they are close to each other in the space of opinions in the figure above. For example, we can measure distance as the distance on the horizontal axis plus the distance on the vertical axis (a mathematician might call this “Manhattan distance” or \(p=1\) norm).

  • Parties R and AP are close to each other. The distance is \(0 + 1 = 1\).
  • Parties R and V are far from each other. The distance is \(3 + 4 = 7\).

The distance in this space is the key to how a VAA determines which parties you most agree with when you take the test. This is what we will cover in part 2.

Part 2: How does a VAA explain the results?

Explanation: ranking parties by distance

Above, we saw that party similarity can be quantified by looking at the distance between parties. We visualized this in two dimensions, but the calculation is the same in any number of dimensions.

Here is a matrix with the numbers from the first figure in this article:

\begin{equation*} X = \begin{bmatrix} \text{AP} & \text{Sp} & \text{Frp} & \text{H}\\ -1 & 1 & -2 & -1\\ 2 & 1 & -2 & -2\\ -2 & -2 & 2 & 2\\ 1 & -1 & -2 & -2\\ 1 & 2 & 2 & -2\\ -1 & -2 & -2 & 2 \end{bmatrix} \end{equation*}

If you take the VAA and your opinions are the vector \(\boldsymbol{y} = (-1, -2, 2, -2, 0, 0)\), then your distance to AP is \(13\), to Sp \(14\), to Frp \(5\), and to H \(4\). In other words, you are likely on the political right and should consider voting for Frp or H.

The distance \(d_j\) between your opinions \(\boldsymbol{y}\) and a party \(j\)’s opinions is given by the sum of the distances for each statement \(i\):

\begin{equation*} d_j = \sum_i | X_{ij} - y_i | \end{equation*}

Distances, \(p\)-norms, and weighting

We can generalize to an arbitrary \(p\)-norm and add a weighting \(w_i\):

\begin{equation*} d_j = \left( \sum_i w_i | X_{ij} - y_i |^p \right)^{(1/p)} \end{equation*}

The weights \(w_i\) are typically equal to \(1\) for most statements, while \(w_i = 2\) for core issues or issues that are particularly important to you. For issues where you have no opinion, we can set \(w_i = 0\). The value chosen for \(p\) depends on how you want to measure distance.

If you believe that \(\boldsymbol{y}\) is closer to party \(A\) in the example below, you should choose \(p=2\). If you think \(\boldsymbol{y}\) is closer to \(B\), you should choose \(p=1\).

\begin{align*} \boldsymbol{y} &= (0, 0, 0, 0) \\ A &= (1, 1, 1, 1) \\ B &= (0, 0, 0, 3) \end{align*}

This is a subjective assessment, and personally, I lean toward \(p=1\). Both of NRK and VG do the same; they both use \(p=1\) in their VAAs.

The calculations ultimately result in a list of parties, sorted by distance:

1. H   (distance 4 )
2. Frp (distance 5 )
3. AP  (distance 13)
4. Sp  (distance 14)

Such a list, sorted by distance, is one possible way to explain the user’s results. But it’s not the only option; the next section covers another type of explanation.

Explanation: Your position as a combination of party positions

The parties’ vectors set directions in the space of opinions. After taking a VAA, you also get a position in this space. (“Ditt standpunkt” means “Your standpoint” or “Your position” in English.)

In the figure above, your position is \(\boldsymbol{y} = (1, -2)\).

One way to reach your position with vector arithmetic is to take MDG’s position and add KrF’s. This can be observed geometrically in the figure above, or algebraically:

\begin{align} \text{MDG} + \text{KrF} = \boldsymbol{y} \\ (2, -1) + (-1, -1) &= (1, -2) \end{align}

Drafting an optimization model

Here is an alternative idea to a distance-based explanation: we express your position as a combination of party positions. We are looking for a combination of parties that describes you by taking us as close to your position as possible. In other words, we are search for a vector \(\boldsymbol{\beta}\) that solves the optimization problem

\begin{align} & \underset{\boldsymbol{\beta}}{\text{minimize}} && \| X \boldsymbol{\beta} - \boldsymbol{y} \|. \end{align}

This model has some practical challenges:

  • It allows complicated explanations in the form of long vectors \(\boldsymbol{\beta}\). For example, we can reach \(\boldsymbol{y}\) through the linear combination
    \begin{align} \text{V} + \text{R} + \text{H} + \text{Ap} + \text{MDG} = \boldsymbol{y}. \end{align}
  • The model allows negative directions because \(\boldsymbol{\beta}\) can consist of negative numbers. This is difficult to interpret. For example, \(-\text{R} = \boldsymbol{y}\) in the example above, but you naturally want to know who you agree with, rather than being told that you disagree with the party \(\text{R}\).
  • The model allows explanations where the sum of the parties’ contributions \(\boldsymbol{\beta}\) exceeds \(1\). This is the case in \(\text{MDG} + \text{KrF} = \boldsymbol{y}\) and in the long explanation above. Again, it is difficult to interpret such an answer.

All these problems can be solved if we limit ourselves to convex combinations.

Final optimization model

We fix the problems in the above model by adding some constraints. Rather than look for any linear combination of parties, we look for the shortest convex combination:

\begin{align} & \underset{\boldsymbol{\beta}}{\text{minimize}} && \| X \boldsymbol{\beta} - \boldsymbol{y} \| + \epsilon \| \boldsymbol{\beta} \| \\ & \text{subject to} && \beta_j \geq 0 && \\ & && \sum_j \beta_j = 1. && \end{align}

The regularization term \(\epsilon \| \boldsymbol{\beta} \|\) means that we prefer simple (short) explanations. The two constraints on \(\boldsymbol{\beta}\) are the definition of a convex combination, and ensure that we avoid negative directions and sums that exceed \(1\). The sum \(\sum_j \beta_j\) must equal \(1\), rather than be less than \(1\), because if \(\boldsymbol{y} = (0, 1)\), \(A = (0, 1)\), and \(B = (0, 2)\), we expect \(\boldsymbol{\beta} = (1, 0)\) as the answer, not \(\boldsymbol{\beta} = (0, 0.5)\).

With the \(2\)-norm on both terms in the objective function, the solution to the optimization problem becomes:

\begin{align} 0.69 \, \text{V} + 0.23 \, \text{H} + 0.08 \, \text{Frp} = \boldsymbol{y}. \end{align}

If you go back to the figure and compare the geometry with this solution, you’ll see that the solution makes sense. That said, there are often no single correct approach in mathematical modeling, and other variations of such a model might be appropriate.

The optimization model on a larger example

Let’s run the optimization model on the same example as before:

\begin{align*} [X \mid \boldsymbol{y}] = \begin{bmatrix} \text{AP} & \text{Sp} & \text{Frp} & \text{H} & \boldsymbol{y} \\ -1 & 1 & -2 & -1 & -1\\ 2 & 1 & -2 & -2 & -2\\ -2 & -2 & 2 & 2 & 2\\ 1 & -1 & -2 & -2 & -2 \\ 1 & 2 & 2 & -2 & 0\\ -1 & -2 & -2 & 2 & 0 \end{bmatrix} \end{align*}

The answer is given below and corresponds well with the ranking we obtained earlier.

1. H   (51%)
2. Frp (45%)
4. Sp  (4%)
3. AP  (0%)

The difference between distance-based and combination-based explanation

We have seen two ways to explain a user’s results in a VAA:

  1. Distance-based explanation - ranks parties by distance from the user
  2. Combination-based explanation - the user is expressed as the nearest convex combination of parties

The difference between the explanations is illustrated in the figure below.

  • Distance-based explanation says that \(\boldsymbol{y}\) is closest to \(B\), then \(A\), and finally \(C\). Distance-based explanation is mutually independent; the distance to \(B\) does not affect the distance to \(A\). If \(B\) had disappeared, the distance to \(A\) would remain unchanged.
  • Combination-based explanation expresses the position \(\boldsymbol{y}\) as a convex combination. The answer in the figure above becomes \(\boldsymbol{\beta} = (\beta_A, \beta_B, \beta_C) = (0, 0.7, 0.3)\). The optimization model’s explanation is mutually dependent. If \(B\) had disappeared, the explanation would have relied on \(A\), but when \(B\) exists, \(A\) is not needed at all to explain \(\boldsymbol{y}\).

The figure below shows an example where the explanations become completely different.

Measured by distance, \(C\) is closer to \(\boldsymbol{y}\) than both \(A\) and \(B\). Yet \(\boldsymbol{y}\) is explained using \(\boldsymbol{\beta} = (\beta_A, \beta_B, \beta_C) = (0.5, 0.5, 0)\) without using \(C\). Shortest distance tells you which single party is closest. The optimization model tells you which coalition of parties is closest to you.

Part 3: How should a VAA present questions to the user?

In this section, we present an algorithm that selects statements that discriminate well between parties. There may be two reasons for wanting this:

  • We want to use a selection of statements. If parties have answered many questions, we may want to limit ourselves to a smaller selection in the VAA. How can we choose a small subset of good statements that differentiate between parties?
  • Early termination. We want the user to be able to end the VAA after any number of questions. How do we ensure that the user has been presented with a diverse set of statements even if they do not answer every question?

To motivate the algorithm, we start by looking at the same example as earlier in the article.

If we ask about a railway to Tromsø or whether the county municipality should be abolished, we gather no information because the user’s answer does not discriminate between parties. The subspace spanned by the vector \(\boldsymbol{e} = (1, 1)\) consists of such trivial statements. The best statement to present to the user is therefore “Drop the ferry-free E39 along the west coast,” since this statement lies furthest from \(\operatorname{span} (\{ \boldsymbol{e} \} )\).

Algorithm for selecting statements in sequence

Let’s generalize from two dimensions to an arbitrary number of dimensions:

  • Start with the trivial subspace spanned by the statement \(\boldsymbol{e} = (1, 1, \ldots, 1)\).
  • The first statement selected is the one that is furthest from \(\operatorname{span} (\{ \boldsymbol{e} \} )\). This is \(\boldsymbol{v}_1 = \arg \max_j \| P_S \boldsymbol{v}_j - \boldsymbol{v}_j \|_2\), where \(P_S\) is a projection matrix that projects a vector onto \(S\). The formula for such a projection matrix is \(P_S = S S^{+} = S (S^T S)^{-1} S^T\), where the columns of the matrix \(S\) are the vectors that span the space we want to project onto (here just \(\boldsymbol{e}\)).
  • The next statement selected should be far from both the trivial subspace \(\boldsymbol{e}\) and far from the previously selected statement \(\boldsymbol{v}_1\). So we choose the statement that is furthest from \(\operatorname{span} (\{ \boldsymbol{e}, \boldsymbol{v}_1 \} )\). In mathematical notation, this becomes \(\boldsymbol{v}_2 = \arg \max_j \| P_S \boldsymbol{v}_j - \boldsymbol{v}_j \|_2\), where \(S\) in this iteration is a matrix with columns \(\boldsymbol{e}\) and \(\boldsymbol{v}_1\).
  • We continue like this, selecting statements that are furthest from those we have already chosen. If \(\operatorname{rank} \operatorname{span} (\{ \boldsymbol{e}, \boldsymbol{v}_1, \ldots, \boldsymbol{v}_n \} )\) becomes equal to the dimensionality of the space (number of parties), then all remaining statements are in the space. Then the distance \(\| P_S \boldsymbol{v}_j - \boldsymbol{v}_j \|_2 \approx 0\) for all remaining statements \(\boldsymbol{v}_j\). The solution is to treat the selected statements as a FIFO queue: we remove the first selected statement \(\boldsymbol{v}_1\) and try again. Then the next statement we select is always far from the most recently selected statements.

Python code for the algorithm is included at the end of the article. The figure below illustrates the concept in three dimensions. The next statement selected is always the one that is furthest from the space spanned by the trivial direction and previously selected statements. (In english, the first figure says “Start with (1, 1, 1)”, the next says “Choose first statement” and the third says “Choose second statement”.)

Results - bad statements

Let’s look at an example with statements that are similar to each other.

The first two statements are very similar. The third statement about Erna is on the opposite side of the vector space but still does not provide much new information because it is close to the space spanned by the first two statements. In other words: the question about Erna thus reveals little new information, despite the statement being on the opposite side of the vector space.

Results - good statements

Here is a selection of statements that are different from each other, identified by the algorithm above.

The first statement is far from trivial because half of the parties disagree and the other half agree. The second statement is answered differently by six parties and differs from both the trivial statement and the first statement. The algorithm continues to select statements that are consistently different from what the VAA user has already been presented with.

Summary and references

Voting advice applications quantify political opinions. Party responses can be represented as a matrix, where the rows (statements) and columns (parties) can be interpreted as vectors. Such a geometric interpretation gives us valuable insight: we can define what it means for a person to be close to a party, and we can formalize what it means for a statement to discriminate between parties.

The idea of viewing parties as vectors and ranking users by proximity is obvious, and for the 2019 municipal election, NRK used \(p=1\) as a metric. (The fact that NRKbeta’s article in the link above describes middle school mathematics in phrases like “explain the calculation to me who doesn’t have math anxiety” and “I get a headache!” is sad.) The idea of describing a user as a convex combination of parties is my own. In practice, it is more difficult to implement in code and more challenging for an average user to understand. Simple algorithms have inherent value, and distance calculation provides a simple and understandable explanation.

The idea of viewing statements as vectors and selecting statements that are far from each other is my own. The inspiration comes from QR decomposition with pivoting, which is an algorithm for orthogonalizing a matrix. The pivot columns in QR can be seen as a greedy approximation of D-optimality in optimal experimental design, which maximizes the volume of a high-dimensional parallelepiped. The differences between my algorithm and QR decomposition are that (1) we always stay away from the trivial statement \(\boldsymbol{e} = \boldsymbol{1}\) and that (2) we use a FIFO queue. Again I believe simplicitiy has inherent value: I like that this algorithm does not adapt the sequence of questions to what the user responded, since I do not like the idea of a personalized VAA.

This article focused mostly on mathematics, but a VAA must of course contain politically relevant statements, good graphic design, potentially the ability to skip statements or weight statements up, subjective analyses of the results, and so on.

Code

Here is a simple implementation of the algorithm that finds statements that are far from each other. I have not emphasized runtime, numerical stability, caching of calculations, etc. The code is certainly good enough to showcase the idea, but for larger datasets, it should be rewritten.

import collections  # Python 3.12
import numpy as np  # NumPy 2.1.3


def away_from_last_k(A, k=None, verbose=False):
    """Yields the column indices of A that are as far away from a continually
    updated subspace spanned by (1, 1, 1, ...) and a First-In-First-Out (FIFO)
    queue of length k. When a column index is yielded, it is added to the queue.

    Parameters
    ----------
    A : np.ndarray
        A matrix (2D array) with columns [v1 | v2 | ... ].
    k : int, optional
        Numbers of elements to keep in the FIFO queue. None means no limit.
    verbose : bool, optional
        Whether to print information. The default is False.

    Examples
    --------
    >>> A = np.array([[3, 0, 0], [0, 2, 0], [0, 0, 1], [2.5, 0, 0]]).T
    >>> list(away_from_last_k(A, k=None))
    [0, 1, 3, 2]
    >>> list(away_from_last_k(A, k=0)) #  Distance from span({(1, 1, 1)}) only
    [0, 3, 1, 2]
    >>> list(away_from_last_k(A, k=1))
    [0, 1, 3, 2]
    """
    subspace = collections.deque([], maxlen=k)  # Vector in FIFO subspace
    ones = np.ones(A.shape[0])  # Vector of ones, always in subspace
    remaining = dict(enumerate(A.T))  # Remaining column indices of A

    def distance_from_subspace(tuple_):
        idx, v = tuple_
        S = np.vstack([ones] + list(subspace)).T  # Full space as col. mat.
        P_S = S @ np.linalg.pinv(S)  # Projection matrix onto span(S)
        return np.linalg.norm(P_S @ v - v)  # Distance between v and span(S)

    while remaining:
        if verbose:
            inds = list(remaining.keys())
            print(f"\nCol idx of {len(remaining)} remaining vectors: {inds}")
            print(f"Vectors in subspace (excl. ones): {len(subspace)}")

        # Get idx of col vector furthest from S = [ones | subspace]
        idx, _ = max(remaining.items(), key=distance_from_subspace)
        maxdist = distance_from_subspace((idx, remaining[idx]))
        if np.isclose(maxdist, 0):
            if verbose:
                print("Distance was 0. Popping from subspace.")
            subspace.popleft()
            continue

        if verbose:
            print(f"Col idx {idx} furthest from subspace. Dist: {maxdist:.6f}")
        subspace.append(remaining.pop(idx))
        yield idx