Relative strength - Wilks, IPF GL and allometry

In powerlifting, competitors test their maximal strength in three strength exercises: the back squat, the bench press and the deadlift. The Wilks formula was introduced in 1995 as a way to compare competitors across weight classes. For instance, if lifter A weighs 80 kg and benches 140 kg, and lifter B weighs 100 kg and benches 170 kg, who is stronger? In this article we’ll look at several approaches to solving this problem.



The basic idea for comparing lifters of different weights is to correct for bodyweight by finding a regression function \(f(w; \boldsymbol{\theta})\) that maps from bodyweight \(w\) to expected weight lifted. The vector \(\boldsymbol{\theta}\) consists of model parameters, estimated by training a model on a dataset. The final score used to compare lifters is the strength of the lifter divided by the expected weight lifted, at a given bodyweight:

\begin{align*} \text{score}(w, \text{result}) = \frac{\text{result}}{f(w; \hat{\boldsymbol{\theta}})} \end{align*}

It seems natural to require that the regression function \(f(w)\) obeys the following properties:

  • It should pass through the origin.
  • It should be increasing.
  • It should be concave (diminishing returns).

As equations, these requirements are:

\begin{align*} f(0) = 0 \quad f'(w) \geq 0 \quad f''(w) \leq 0 \end{align*}

Candidate regression functions

Before we consider regression functions, it’s instructive to look at some real-world data so we know what we’re modeling. Here are raw (unequipped) lifts from Norwegian lifters in national and international competitions from 2014 to 2023. Lifters from all age groups are included, and the median age is 28. Only results where the lifter ended up with a valid total are included in this dataset. The OpenPowerlifting project contains a lot of powerlifting data, and they are also on GitHub.

Note that we include all lifts, not just the records and not just the best lifts per weight class and meet. There are many options for filtering the data, depending on what the goal of the analysis is. My reason for fitting the entire data set, and not just the best lifts, is that there is less variability when estimating the expected value rather than the maximum value, since there is more data available. This approach also let’s us evaluate the posterior predictive distribution by fitting a probability density function to the residuals.

Let’s now look at concrete regression functions \(f(w; \hat{\boldsymbol{\theta}})\).

(1) Wilks

The famous Wilks formula is simply a 5th order polynomial. It was introduced in 1995, and several sets of coefficients exist. See the Wikipedia entry Wilks coefficient for the specific values of the coefficients.

\begin{align*} f(w) = \theta_0 + \theta_1 w + \theta_2 w^2 + \theta_3 w^3 + \theta_4 w^4 + \theta_5 w^5 \end{align*}

Having plotted the data and seen it, my immediate reaction is that fitting a 5th order polynomial is overkill—there are too many parameters and too much freedom is this formula.

(2) IPF GL

The International Powerlifting Federation (IPF) currently uses the IPF GL formula. It’s a three-parameter exponential function of the form:

\begin{align*} f(w) = \theta_0 - \theta_1 \exp \left( -\theta_2 w \right) \end{align*}

The coefficients may be found in the document The IPF GL Coefficients for Relative Scoring. This formula seems sensible: it has fewer parameters and models the “diminishing returns” that weight gain has on strength.

(3) Allometric formula

In a 2006 paper titled Adjusting powerlifting performances for differences in body mass, the author recommends the so-called allometric model:

\begin{align*} f(w) = \theta_0 - \theta_1 w^{-\theta_2} \end{align*}

This is also a three-parameter model, which seems sensible. Greg Nuckols wrote a nice article on strongerbyscience.com titled Who’s The Most Impressive Powerlifter?, in which he calls it the “perfect intersection of theory and practice.”

(4) Probability density based formulas

Let \(\text{pdf}(w)\) be a probability density function defined on \([0, \infty)\). The cumulative density function \(\text{cdf}(w)\) goes through the origin \((0, 0)\) and eventually reaches \(1\). Any sensible cumulative distribution can be used to construct a regression function, such as one of these

\begin{align*} f(w) = \theta_0 \text{cdf}(w \mid \boldsymbol{\theta}) \quad \quad f(w) = \theta_0 + \theta_1 \text{cdf}(w \mid \boldsymbol{\theta}) \end{align*}

Two of the three previous formulas are actually special cases of this approach:

There are plenty of distributions to choose from, but we will pursue the generalized Pareto distribution as an alternative to IPF GL and the Allometric equation, since it’s a generalization of both the exponential distribution (which is used in IPF GL) and the pareto distribution (which is used in the allometric model).

Evaluation of regression functions

Let’s fit some regression functions to our dataset. The Wilks formula overfits like crazy and needs subjective regularization of the coefficients to behave nicely, so we exclude it. In the figure below, the following functions were fitted to the dataset:

  • Linear function \(f(w) = \theta_0 + \theta_1 w\)
  • Quadratic function \(f(w) = \theta_0 + \theta_1 w + \theta_2 w^2\)
  • IPF GL \(f(w) = \theta_0 - \theta_1 \exp \left( -\theta_2 w \right)\)
  • Allometric equation \(f(w) = \theta_0 - \theta_1 w^{-\theta_2}\)
  • Generalized Pareto \(f(w) = \theta_0 \text{CDFpareto}(w \mid \theta_1, \theta_2)\)

They all look pretty similar to the naked eye, especially in the bodyweight range where most lifters reside. Using out-of-sample \(R^2\) across a 10-fold cross validation to compare the fits, we obtain the following results:

model bench-female bench-male deadlift-female deadlift-male squat-female squat-male geometric_avg avg
gl 3.975 30.878 11.905 26.850 13.515 32.924 16.106 19.45
linear 3.981 30.895 11.924 26.868 13.531 32.941 16.123 19.466
allometric 4.512 31.829 13.717 28.209 14.821 33.799 17.409 20.614
quadratic 4.482 31.531 15.237 28.135 15.14 33.474 17.695 20.814
pareto 5.246 31.554 16.153 28.217 15.733 33.539 18.478 21.274

The generalized Pareto appears to be the best fit. While the quadratic comes second, notice that it often curves down at higher bodyweights, which is not a realistic model theoretically even if it fits this particular data set well (even out-of-sample). This could be remedied by putting constraints on the parameters, but we will not pursue this.

Residuals

Let’s choose the generalized Pareto model and look at the residuals:

There appears to be some heteroscedasticity (residuals being a function of the predicted value), especially for women. If we ignore the heteroscedasticity, we can fit a student-t distribution to the residuals. The fits all seem reasonable:

Now we’re in a position to say in which percentile a lifter resides, given his bodyweight and results. At the bottom of this article you’ll find a Python code snippet that does exactly this. For instance, I have benched 182.5 kg at 104 kg bodyweight, which puts me in the 88th percentile:

>>> percentile(bodyweight=104, result=182.5, sex="male", exercise="bench")
0.8880470537426874

Adding age and experience

Let’s also add age and experience to the regression model. Here experience means years since first powerlifting meet. This is a crude estimate of actual training experience, but it’s what we have in the dataset.

Looking at the data, I believe the function \(f(x) = x \exp(-x)\) provides a good starting point as a regression function to correct for age. Shifting and scaling this function, we obtain

\begin{align*} f(x) = \theta_0 (\theta_1 + \theta_2 x) \exp\left( - \theta_1 - \theta_2 x \right) + \theta_3. \end{align*}

Fitting the data, it seems to be a decent fit indeed. Other functions could also be considered here, but let’s stick with this one.

To model experience I shifted and scaled \(f(x) = \exp (x)\), i.e., used \(f(x) = \theta_0 - \theta_1 \exp \left( -\theta_2 x \right)\). This is the same function as IPF GL, but used to model the effect of experience this time, not bodyweight.

The full additive model takes into account bodyweight (bw), age and experience:

\begin{align*} f(\text{bw}, \text{age}, \text{experience} \mid \boldsymbol{\theta}) &= \theta_1 \left( 1 - \left( 1 - \frac{\theta_2  \text{bw}}{\theta_3}  \right)^{- 1 /\theta_2}  \right)\\ &+ \theta_4 (\theta_5 + \theta_6 \text{age}) \exp\left( - \theta_5 - \theta_6 \text{age} \right)\\ &+ \theta_7 \exp \left( -\theta_8 \text{experience} \right) + \theta_9 \end{align*}

Fitting this nine-parameter model to data was challenging. The function has several local minima and no parameter symmetries. Choosing a good starting value for the optimization routine (L-BFGS with automatic differentiation) by hand was crucial to be able to find the global minimum. You can find code for this model, with fitted parameter values, at the end of this article. Again I used the student-t distribution to model the residual distribution.

Here is a little sensitivity analysis based on the following base case:

>>> percentile(100, 180, age=32, experience=16, sex="male", exercise="bench")
0.9008126632486967

Changing the bodyweight, weight lifted, age and experience, we can examine what percentiles are achieved. The higher the percentile, the more impressive the lifter is. All else being equal, lower bodyweight is better, more weight lifted is better, and less experience is better. The effect of age is not monotonic, but essentially being a strong 30 year old is not as impressive as being younger or older and equally strong.

Finally, here is a comparison of the additive model above with (1) a  PolynomialFeatures(degree=3) followed by a Ridge and a (2) GradientBoostingRegressor. It’s comforting that the nine-parameter model above is so competitive with these more complex models—both in terms of the modeling work and in terms of trusting that the optimization routine found the global minimum.

model bench-female bench-male deadlift-female deadlift-male squat-female squat-male
additive 46.495 53.601 47.891 53.310 51.386 60.162
polyridge 47.347 53.341 48.430 52.059 53.126 60.006
boosting 47.681 56.364 49.805 54.773 55.621 62.340

The boosting model is more or less an upper bound on what we can achieve if we use a non-linear, complex model with interactions between the features. However, boosting models are not smooth, so they are not suited to our purposes—we might want to compare lifters who are very similar, and boosting might return the exact same output for such lifters. It’s also pleasing to be able to understand the model and not use a black box approach, as well as being able to implement the model in Excel or in SQL.

The polynomial expansion followed by Ridge regression has too many parameters. The interactions are likely not needed. It’s likely biased on non-interaction terms, since the cubic terms might not approximate the data set well enough along those dimensions.

Summary

We’ve compared four different models for determining relative strength: (1) Wilks, (2) IPF GL, (3) the allometric equation (power equation) and (4) my own approach using the generalized Pareto distribution. The generalized Pareto distribution is an extension of methods (2) and (3), and produces a better fit as measured by out-of-sample \(R^2\). We also added age and years of lifting experience to the model, which raises the \(R^2\) score significantly.

Notes and references

Code

Here are two pieces of code for Python 3.9.4, using scipy 1.10.1 and numpy 1.24.3.

Correcting for bodyweight

from scipy import stats


def percentile(bodyweight, result, sex="male", exercise="squat"):
    """Estimate where in the population a lifter is."""
    assert sex in ("male", "female")
    assert exercise in ("squat", "bench", "deadlift")
    bw = bodyweight

    if sex == "male" and exercise == "squat":
        expected_value = 352.429 * stats.genpareto(-0.271, 0, 120.058).cdf(bw)
        distribution = stats.t(df=30.489, loc=expected_value, scale=35.436)

    if sex == "male" and exercise == "bench":
        expected_value = 209.585 * stats.genpareto(-0.438, 0, 106.662).cdf(bw)
        distribution = stats.t(df=53.601, loc=expected_value, scale=25.743)

    if sex == "male" and exercise == "deadlift":
        expected_value = 304.798 * stats.genpareto(-0.364, 0, 80.54).cdf(bw)
        distribution = stats.t(df=16.004, loc=expected_value, scale=35.677)

    if sex == "female" and exercise == "squat":
        expected_value = 137.604 * stats.genpareto(-0.699, 0, 58.368).cdf(bw)
        distribution = stats.t(df=16.963, loc=expected_value, scale=23.484)

    if sex == "female" and exercise == "bench":
        expected_value = 78.235 * stats.genpareto(-0.544, 0, 45.897).cdf(bw)
        distribution = stats.t(df=2584566.275, loc=expected_value, scale=16.213)

    if sex == "female" and exercise == "deadlift":
        expected_value = 156.412 * stats.genpareto(-0.712, 0, 55.284).cdf(bw)
        distribution = stats.t(df=10.454, loc=expected_value, scale=21.588)

    return distribution.cdf(result)


percentile(105, 215, sex="male", exercise="squat") # 0.416
percentile(75, 140, sex="female", exercise="squat") # 0.625

For instance, a man lifting 215 kg in the squat at a bodyweight of 105 kg is estimated to be in the 41th percentile of results (from national and international powerlifting competitions, not regional and local competitions). A woman lifting 140 kg in the squat at a bodyweight of 75 kg is estimated to be in the 62nd percentile.

Correcting for bodyweight, age and experience

import numpy as np
from scipy import stats


def percentile(bodyweight, result, age=28, experience=0, sex="male", exercise="squat"):
    """Estimate where in the population a lifter is."""
    assert sex in ("male", "female")
    assert exercise in ("squat", "bench", "deadlift")

    result = result / 302.5
    age = np.maximum(14, age) / 82
    bodyweight = bodyweight / 175.94
    experience = experience / 13.3361

    # r2 = 0.6016
    if sex == "male" and exercise == "squat":
        shifted_age = 4.73248 * age + -0.564304
        expected_value = (
            1.962127 * (1 - (1 + 1.22139 * (bodyweight / 0.273659)) ** (-(1 / 1.22139)))
            + 1.007862 * shifted_age * np.exp(-shifted_age)
            + -0.174818 * np.exp(-0.174818 * experience)
            + -0.745055
        )
        distribution = stats.t(df=16.266596, loc=expected_value, scale=0.088748)

    # r2 = 0.536
    if sex == "male" and exercise == "bench":
        shifted_age = 4.638334 * age + -0.648128
        expected_value = (
            1.552141
            * (1 - (1 + 1.063958 * (bodyweight / 0.143613)) ** (-(1 / 1.063958)))
            + 0.652744 * shifted_age * np.exp(-shifted_age)
            + -0.09782 * np.exp(-0.09782 * experience)
            + -0.868995
        )
        distribution = stats.t(df=17.800214, loc=expected_value, scale=0.067757)

    # r2 = 0.5331
    if sex == "male" and exercise == "deadlift":
        shifted_age = 4.731745 * age + -0.618388
        expected_value = (
            2.010131
            * (1 - (1 + 1.298737 * (bodyweight / 0.179062)) ** (-(1 / 1.298737)))
            + 1.025128 * shifted_age * np.exp(-shifted_age)
            + -0.141842 * np.exp(-0.141842 * experience)
            + -0.844498
        )
        distribution = stats.t(df=12.0948, loc=expected_value, scale=0.093314)

    # r2 = 0.5139
    if sex == "female" and exercise == "squat":
        shifted_age = 4.595259 * age + -0.511785
        expected_value = (
            1.162607
            * (1 - (1 + 1.117215 * (bodyweight / 0.109907)) ** (-(1 / 1.117215)))
            + 0.692023 * shifted_age * np.exp(-shifted_age)
            + -0.149254 * np.exp(-0.149254 * experience)
            + -0.589273
        )
        distribution = stats.t(df=25.38014, loc=expected_value, scale=0.060948)

    # r2 = 0.465
    if sex == "female" and exercise == "bench":
        shifted_age = 4.601688 * age + -0.489447
        expected_value = (
            1.120982
            * (1 - (1 + 1.160129 * (bodyweight / 0.027002)) ** (-(1 / 1.160129)))
            + 0.433714 * shifted_age * np.exp(-shifted_age)
            + -0.107659 * np.exp(-0.107659 * experience)
            + -0.846403
        )
        distribution = stats.t(df=38.954515, loc=expected_value, scale=0.039665)

    # r2 = 0.4789
    if sex == "female" and exercise == "deadlift":
        shifted_age = 2.726854 * age + 0.014638
        expected_value = (
            1.579237
            * (1 - (1 + 0.000283 * (bodyweight / 0.116393)) ** (-(1 / 0.000283)))
            + 1.214288 * shifted_age * np.exp(-shifted_age)
            + -0.122571 * np.exp(-0.122571 * experience)
            + -1.377743
        )
        distribution = stats.t(df=12.507959, loc=expected_value, scale=0.057911)

    return distribution.cdf(result)


percentile(100, 180, age=32, experience=16, sex="male", exercise="bench")  # 0.901