**Need help with data science or mathematical modeling?**I do consulting work in Norway. Read about my previous work experience and reach out to me for more information.

# Relative strength - Wilks, IPF GL and allometry

- 11. October 2023
- #datascience, #strength

In powerlifting, competitors test their maximal strength in three strength exercises: the back squat, the bench press and the deadlift. The Wilks formula was introduced in 1995 as a way to compare competitors across weight classes. For instance, if lifter A weighs 80 kg and benches 140 kg, and lifter B weighs 100 kg and benches 170 kg, who is stronger? In this article we’ll look at several approaches to solving this problem.

The basic idea for comparing lifters of different weights is to correct for bodyweight by finding a regression function \(f(w; \boldsymbol{\theta})\) that maps from bodyweight \(w\) to expected weight lifted. The vector \(\boldsymbol{\theta}\) consists of model parameters, estimated by training a model on a dataset. The final score used to compare lifters is the strength of the lifter divided by the expected weight lifted, at a given bodyweight:

It seems natural to require that the regression function \(f(w)\) obeys the following properties:

- It should pass through the origin.
- It should be increasing.
- It should be concave (diminishing returns).

As equations, these requirements are:

## Candidate regression functions

Before we consider regression functions, it’s instructive to look at some real-world data so we know what we’re modeling. Here are raw (unequipped) lifts from Norwegian lifters in national and international competitions from 2014 to 2023. Lifters from all age groups are included, and the median age is 28. Only results where the lifter ended up with a valid total are included in this dataset. The OpenPowerlifting project contains a lot of powerlifting data, and they are also on GitHub.

Note that we include all lifts, not just the records and not just the best lifts per weight class and meet. There are many options for filtering the data, depending on what the goal of the analysis is. My reason for fitting the entire data set, and not just the best lifts, is that there is less variability when estimating the expected value rather than the maximum value, since there is more data available. This approach also let’s us evaluate the posterior predictive distribution by fitting a probability density function to the residuals.

Let’s now look at concrete regression functions \(f(w; \hat{\boldsymbol{\theta}})\).

### (1) Wilks

The famous Wilks formula is simply a 5th order polynomial. It was introduced in 1995, and several sets of coefficients exist. See the Wikipedia entry Wilks coefficient for the specific values of the coefficients.

Having plotted the data and seen it, my immediate reaction is that fitting a 5th order polynomial is overkill—there are too many parameters and too much freedom is this formula.

### (2) IPF GL

The International Powerlifting Federation (IPF) currently uses the IPF GL formula. It’s a three-parameter exponential function of the form:

The coefficients may be found in the document The IPF GL Coefficients for Relative Scoring. This formula seems sensible: it has fewer parameters and models the “diminishing returns” that weight gain has on strength.

### (3) Allometric formula

In a 2006 paper titled Adjusting powerlifting performances for differences in body mass, the author recommends the so-called allometric model:

This is also a three-parameter model, which seems sensible. Greg Nuckols wrote a nice article on strongerbyscience.com titled Who’s The Most Impressive Powerlifter?, in which he calls it the “perfect intersection of theory and practice.”

### (4) Probability density based formulas

Let \(\text{pdf}(w)\) be a probability density function defined on \([0, \infty)\). The cumulative density function \(\text{cdf}(w)\) goes through the origin \((0, 0)\) and eventually reaches \(1\). Any sensible cumulative distribution can be used to construct a regression function, such as one of these

Two of the three previous formulas are actually special cases of this approach:

- The
**IPF GL**is of this form. The distribution used is the exponential distribution. - The
**Allometric equation**also has this form, and it uses the pareto distribution.

There are plenty of distributions to choose from, but we will pursue the generalized Pareto distribution as an alternative to IPF GL and the Allometric equation, since it’s a generalization of both the exponential distribution (which is used in IPF GL) and the pareto distribution (which is used in the allometric model).

## Evaluation of regression functions

Let’s fit some regression functions to our dataset. The Wilks formula overfits like crazy and needs subjective regularization of the coefficients to behave nicely, so we exclude it. In the figure below, the following functions were fitted to the dataset:

- Linear function \(f(w) = \theta_0 + \theta_1 w\)
- Quadratic function \(f(w) = \theta_0 + \theta_1 w + \theta_2 w^2\)
- IPF GL \(f(w) = \theta_0 - \theta_1 \exp \left( -\theta_2 w \right)\)
- Allometric equation \(f(w) = \theta_0 - \theta_1 w^{-\theta_2}\)
- Generalized Pareto \(f(w) = \theta_0 \text{CDFpareto}(w \mid \theta_1, \theta_2)\)

They all look pretty similar to the naked eye, especially in the bodyweight range where most lifters reside. Using out-of-sample \(R^2\) across a 10-fold cross validation to compare the fits, we obtain the following results:

model | bench-female | bench-male | deadlift-female | deadlift-male | squat-female | squat-male | geometric_avg | avg |
---|---|---|---|---|---|---|---|---|

gl | 3.975 | 30.878 | 11.905 | 26.850 | 13.515 | 32.924 | 16.106 | 19.45 |

linear | 3.981 | 30.895 | 11.924 | 26.868 | 13.531 | 32.941 | 16.123 | 19.466 |

allometric | 4.512 | 31.829 | 13.717 | 28.209 | 14.821 | 33.799 | 17.409 | 20.614 |

quadratic | 4.482 | 31.531 | 15.237 | 28.135 | 15.14 | 33.474 | 17.695 | 20.814 |

pareto | 5.246 | 31.554 | 16.153 | 28.217 | 15.733 | 33.539 | 18.478 | 21.274 |

The generalized Pareto appears to be the best fit. While the quadratic comes second, notice that it often curves down at higher bodyweights, which is not a realistic model theoretically even if it fits this particular data set well (even out-of-sample). This could be remedied by putting constraints on the parameters, but we will not pursue this.

## Residuals

Let’s choose the generalized Pareto model and look at the residuals:

There appears to be some heteroscedasticity (residuals being a function of the predicted value), especially for women. If we ignore the heteroscedasticity, we can fit a student-t distribution to the residuals. The fits all seem reasonable:

Now we’re in a position to say in which percentile a lifter resides, given his bodyweight and results. At the bottom of this article you’ll find a Python code snippet that does exactly this. For instance, I have benched 182.5 kg at 104 kg bodyweight, which puts me in the 88th percentile:

```
>>> percentile(bodyweight=104, result=182.5, sex="male", exercise="bench")
0.8880470537426874
```

## Adding age and experience

Let’s also add age and experience to the regression model.
Here experience means *years since first powerlifting meet*.
This is a crude estimate of actual training experience, but it’s what we have in the dataset.

Looking at the data, I believe the function \(f(x) = x \exp(-x)\) provides a good starting point as a regression function to correct for age. Shifting and scaling this function, we obtain

Fitting the data, it seems to be a decent fit indeed. Other functions could also be considered here, but let’s stick with this one.

To model experience I shifted and scaled \(f(x) = \exp (x)\), i.e., used \(f(x) = \theta_0 - \theta_1 \exp \left( -\theta_2 x \right)\). This is the same function as IPF GL, but used to model the effect of experience this time, not bodyweight.

The full additive model takes into account bodyweight (bw), age and experience:

Fitting this nine-parameter model to data was challenging. The function has several local minima and no parameter symmetries. Choosing a good starting value for the optimization routine (L-BFGS with automatic differentiation) by hand was crucial to be able to find the global minimum. You can find code for this model, with fitted parameter values, at the end of this article. Again I used the student-t distribution to model the residual distribution.

Here is a little sensitivity analysis based on the following base case:

```
>>> percentile(100, 180, age=32, experience=16, sex="male", exercise="bench")
0.9008126632486967
```

Changing the bodyweight, weight lifted, age and experience, we can examine what percentiles are achieved. The higher the percentile, the more impressive the lifter is. All else being equal, lower bodyweight is better, more weight lifted is better, and less experience is better. The effect of age is not monotonic, but essentially being a strong 30 year old is not as impressive as being younger or older and equally strong.

Finally, here is a comparison of the additive model above with (1) a
`PolynomialFeatures(degree=3)`

followed by a `Ridge`

and a (2) `GradientBoostingRegressor`

.
It’s comforting that the nine-parameter model above is so competitive with these more complex models—both in terms of the modeling work and in terms of trusting that the optimization routine found the global minimum.

model | bench-female | bench-male | deadlift-female | deadlift-male | squat-female | squat-male |
---|---|---|---|---|---|---|

additive | 46.495 | 53.601 | 47.891 | 53.310 | 51.386 | 60.162 |

polyridge | 47.347 | 53.341 | 48.430 | 52.059 | 53.126 | 60.006 |

boosting | 47.681 | 56.364 | 49.805 | 54.773 | 55.621 | 62.340 |

The boosting model is more or less an upper bound on what we can achieve if we use a non-linear, complex model with interactions between the features. However, boosting models are not smooth, so they are not suited to our purposes—we might want to compare lifters who are very similar, and boosting might return the exact same output for such lifters. It’s also pleasing to be able to understand the model and not use a black box approach, as well as being able to implement the model in Excel or in SQL.

The polynomial expansion followed by Ridge regression has too many parameters. The interactions are likely not needed. It’s likely biased on non-interaction terms, since the cubic terms might not approximate the data set well enough along those dimensions.

## Summary

We’ve compared four different models for determining relative strength: (1) Wilks, (2) IPF GL, (3) the allometric equation (power equation) and (4) my own approach using the generalized Pareto distribution. The generalized Pareto distribution is an extension of methods (2) and (3), and produces a better fit as measured by out-of-sample \(R^2\). We also added age and years of lifting experience to the model, which raises the \(R^2\) score significantly.

## Notes and references

- LifterCalc implements IPF GL, DOTS and WILKS. You can find the source code on GitHub. The DOTS formula is a 4th degree polynomial, a subset of WILKS.
- A paper from 1999 titled Validation of the Wilks powerlifting formula might interest some readers.
- A more recent paper titled Efficiency of the Wilks and IPF Formulas from 2020 might also be of interest.
- Read more about IPF GL on the page IPF GL Formula.
- The article The Confusing History of Strength Coefficients might be of interest.

## Code

Here are two pieces of code for Python 3.9.4, using scipy 1.10.1 and numpy 1.24.3.

### Correcting for bodyweight

```
from scipy import stats
def percentile(bodyweight, result, sex="male", exercise="squat"):
"""Estimate where in the population a lifter is."""
assert sex in ("male", "female")
assert exercise in ("squat", "bench", "deadlift")
bw = bodyweight
if sex == "male" and exercise == "squat":
expected_value = 352.429 * stats.genpareto(-0.271, 0, 120.058).cdf(bw)
distribution = stats.t(df=30.489, loc=expected_value, scale=35.436)
if sex == "male" and exercise == "bench":
expected_value = 209.585 * stats.genpareto(-0.438, 0, 106.662).cdf(bw)
distribution = stats.t(df=53.601, loc=expected_value, scale=25.743)
if sex == "male" and exercise == "deadlift":
expected_value = 304.798 * stats.genpareto(-0.364, 0, 80.54).cdf(bw)
distribution = stats.t(df=16.004, loc=expected_value, scale=35.677)
if sex == "female" and exercise == "squat":
expected_value = 137.604 * stats.genpareto(-0.699, 0, 58.368).cdf(bw)
distribution = stats.t(df=16.963, loc=expected_value, scale=23.484)
if sex == "female" and exercise == "bench":
expected_value = 78.235 * stats.genpareto(-0.544, 0, 45.897).cdf(bw)
distribution = stats.t(df=2584566.275, loc=expected_value, scale=16.213)
if sex == "female" and exercise == "deadlift":
expected_value = 156.412 * stats.genpareto(-0.712, 0, 55.284).cdf(bw)
distribution = stats.t(df=10.454, loc=expected_value, scale=21.588)
return distribution.cdf(result)
percentile(105, 215, sex="male", exercise="squat") # 0.416
percentile(75, 140, sex="female", exercise="squat") # 0.625
```

For instance, a man lifting 215 kg in the squat at a bodyweight of 105 kg is estimated to be in the 41th percentile of results (from national and international powerlifting competitions, not regional and local competitions). A woman lifting 140 kg in the squat at a bodyweight of 75 kg is estimated to be in the 62nd percentile.

### Correcting for bodyweight, age and experience

```
import numpy as np
from scipy import stats
def percentile(bodyweight, result, age=28, experience=0, sex="male", exercise="squat"):
"""Estimate where in the population a lifter is."""
assert sex in ("male", "female")
assert exercise in ("squat", "bench", "deadlift")
result = result / 302.5
age = np.maximum(14, age) / 82
bodyweight = bodyweight / 175.94
experience = experience / 13.3361
# r2 = 0.6016
if sex == "male" and exercise == "squat":
shifted_age = 4.73248 * age + -0.564304
expected_value = (
1.962127 * (1 - (1 + 1.22139 * (bodyweight / 0.273659)) ** (-(1 / 1.22139)))
+ 1.007862 * shifted_age * np.exp(-shifted_age)
+ -0.174818 * np.exp(-0.174818 * experience)
+ -0.745055
)
distribution = stats.t(df=16.266596, loc=expected_value, scale=0.088748)
# r2 = 0.536
if sex == "male" and exercise == "bench":
shifted_age = 4.638334 * age + -0.648128
expected_value = (
1.552141
* (1 - (1 + 1.063958 * (bodyweight / 0.143613)) ** (-(1 / 1.063958)))
+ 0.652744 * shifted_age * np.exp(-shifted_age)
+ -0.09782 * np.exp(-0.09782 * experience)
+ -0.868995
)
distribution = stats.t(df=17.800214, loc=expected_value, scale=0.067757)
# r2 = 0.5331
if sex == "male" and exercise == "deadlift":
shifted_age = 4.731745 * age + -0.618388
expected_value = (
2.010131
* (1 - (1 + 1.298737 * (bodyweight / 0.179062)) ** (-(1 / 1.298737)))
+ 1.025128 * shifted_age * np.exp(-shifted_age)
+ -0.141842 * np.exp(-0.141842 * experience)
+ -0.844498
)
distribution = stats.t(df=12.0948, loc=expected_value, scale=0.093314)
# r2 = 0.5139
if sex == "female" and exercise == "squat":
shifted_age = 4.595259 * age + -0.511785
expected_value = (
1.162607
* (1 - (1 + 1.117215 * (bodyweight / 0.109907)) ** (-(1 / 1.117215)))
+ 0.692023 * shifted_age * np.exp(-shifted_age)
+ -0.149254 * np.exp(-0.149254 * experience)
+ -0.589273
)
distribution = stats.t(df=25.38014, loc=expected_value, scale=0.060948)
# r2 = 0.465
if sex == "female" and exercise == "bench":
shifted_age = 4.601688 * age + -0.489447
expected_value = (
1.120982
* (1 - (1 + 1.160129 * (bodyweight / 0.027002)) ** (-(1 / 1.160129)))
+ 0.433714 * shifted_age * np.exp(-shifted_age)
+ -0.107659 * np.exp(-0.107659 * experience)
+ -0.846403
)
distribution = stats.t(df=38.954515, loc=expected_value, scale=0.039665)
# r2 = 0.4789
if sex == "female" and exercise == "deadlift":
shifted_age = 2.726854 * age + 0.014638
expected_value = (
1.579237
* (1 - (1 + 0.000283 * (bodyweight / 0.116393)) ** (-(1 / 0.000283)))
+ 1.214288 * shifted_age * np.exp(-shifted_age)
+ -0.122571 * np.exp(-0.122571 * experience)
+ -1.377743
)
distribution = stats.t(df=12.507959, loc=expected_value, scale=0.057911)
return distribution.cdf(result)
percentile(100, 180, age=32, experience=16, sex="male", exercise="bench") # 0.901
```