**Need help with data science or mathematical modeling?**I do consulting work in Norway. Read about my previous work experience and reach out to me for more information.

# The COVID-19 TRAiN-study and its questionable statistics

- 16. August 2020
- #statistics

Could visiting training facilities lead to increased COVID-19 spread? This is the central question the Norwegian TRAiN-study attempts to answer. Despite the confident conclusion in the paper and the media coverage, the study is unable to provide a statistically meaningful answer.

The Norwegian TRAiN-study resulted in a paper preprint titled *“Randomized Re-Opening of Training Facilities during the COVID-19 pandemic“*.
The 16 authors conclude that:

“Provided good hygiene and social distancing measures, there was no increased COVID-19 spread at training facilities.”

Both the treatment group and the control group had zero infections (except one person in the gym group who never went to the gym). The conclusion above is obviously true for the sample, but there is no proof that it will generalize to the population.

Without any infections in either group, it’s strange to make a statistical claim of “no increased COVID-19 spread.” It’s also impossible to measure the effects of “good hygiene and social distancing measures.” Hygiene hardly matters if there is no disease in the first place.

## In the media

News of the study quickly spread. The New York Times repeated the conclusion of the paper:

“Are people who work out at gyms with modest restrictions at greater risk of infection from the coronavirus than those who do not? The tentative answer after two weeks: no.”

They also interviewed an epidemiologist, and he disapproves: “these findings don’t tell me that going to the gym isn’t riskier than not going to the gym, even in Oslo.” I agree with him.

The study made rounds in Norwegian channels such as Aftenposten and NRK. Here, one of the authors made a more modest claim: it’s safe to open gyms. This claim is supported by the data, since with few infections we can at least say that life is generally safe – whether you hit the gym or not.

## The data from the study

A total of \(3764\) individuals were randomized and included in the study.

- \(1896\) individuals were allowed to go to the gym. One of the people tested positive after two weeks. This person never went to the gym and will be treated as an observed negative.
- \(1868\) individuals were not allowed to go to the gym. No one tested positive in this group after two weeks.

The authors assumed that \(1\, \%\) of individuals in each group would test positive for COVID-19 at the end of the intervention. Had this happened, the results would have had much more statistical power. But it did not happen. (Not every participant returned their test kits, see the paper for details.)

## Probability of infection given data

We denote the probability of infection within two weeks when going to a gym by \(\theta_{\textrm{G}}\), and the probability of infection when not going to a gym by \(\theta_{\textrm{NG}}\). We assume that the number of people who tested positive is given by a binomial distribution. Given the data and a uniform prior on \(\theta\), we can then plot at the posterior Beta distributions.

As seen in the figure above, the Beta densities overlap almost perfectly. They are not exactly identical due to the different group sizes. The authors assumed that \(1\, \%\) of individuals would test positive in both groups, but the median of the posterior distribution is \(0.04 \%\) – much lower.

Contingent on the data, it appears that \(\theta_{\textrm{G}}\) is more or less equal to \(\theta_{\textrm{NG}}\) in the absolute sense – both numbers are very low. In the relative sense (if we divided one number by the other), it could very well be that one of them is much greater than the other. This is shown in the figure below.

## Conclusion

Gyms are safe because the transmission of disease was low in the duration of the study (two weeks) in general – *any* normal activity was safe for most people.
If the infection rate was higher it could very well be that gyms would propagate the spread of disease.
The conclusion of the paper leads with *“Provided good hygiene and social distancing measures”*, but there is not enough data to say anything about the effect of preventive measures in this context.

The study comprises useful data points about the spread of COVID-19 in general, but it does not say much about:

- Whether gyms propagate the spread of disease if \(\theta\) is higher. Of course gyms do not increase the risk of infection if \(\theta\) is low, since then there is no sickness to spread.
- To which degree preventive measures help reduce infections in gyms. Preventive measures obviously do help, but the general population are not health care professionals and gyms are not hospitals. The study says little about how well preventive measures work in practice in gyms.

All we can say is that if the baseline probability of contracting COVID-19 is low, and the time frame is short, and there are a couple of thousand people in the study – then the probability of getting sick does not increase by going to the gym. But we already know this, since people do not contract COVID-19 by merely going to the gym. There must be disease to spread in the first place. I’m glad few participants got sick, but the data coming from this study would have been more interesting if they did.

## Appendix: What if?

Let’s assume a uniform prior on \(\theta\), but different outcomes on the number of people who tested positive in the groups. We can sample the posterior Beta distributions to obtain credible intervals on the posterior distributions of the factor \(P(\theta_{\mathrm{G}} / \theta_{\mathrm{NG}})\) and the difference \(P(\theta_{\mathrm{G}} - \theta_{\mathrm{NG}})\).

### What if \(20\) people in each group contracted COVID-19?

This is in line with what the authors thought would happen. They wrote:

“We assumed that 1% of individuals in in each group would test positive for SARS-CoV-2 at the end of the intervention.”

If this happened, then \(99\,\%\) of the probability mass of \(P(\theta_{\mathrm{G}} / \theta_{\mathrm{NG}})\) ends up in the interval \([0.44, 2.197]\). It would be likely that going to the gym does not increase the spread of disease, and in the worst case the probability of contracting COVID-19 is approximately twice as high for gym-goers.

### What is \(40\) gym goers and \(20\) non-gym goers contracted COVID-19?

Then \(99\,\%\) of the probability mass of \(P(\theta_{\mathrm{G}} / \theta_{\mathrm{NG}})\) is in the interval \([0.996, 4.01]\), and the median is \(1.939\). It would be likely that going to the gym does increase the probability of contracting COVID-19, both in the relative and absolute sense.