**Need help with data science or mathematical modeling?**I do consulting work in Norway. Read about my previous work experience and reach out to me for more information.

# Group testing

- 18. April 2022
- #statistics, #optimization

Suppose there are \(n=2\) people in your household, and you wish to determine whether each one has COVID-19. You have many tests available, but would like to use as few tests as possible. How can you accomplish this goal?

The obvious answer is to use \(2\) individual tests, one for each of you.

However, you could also group the tests by mixing your samples. Suppose the test is perfect, and that the independent probabilities that each of you has COVID-19 is \(p\). If you group your tests, the probability that the mixed sample is negative is \((1 - p)^2\) and the probability that the mixed sample is positive is \(1 - (1-p)^2\).

If the mixed sample is negative, then both people are negative and you used only \(1\) test to determine this. If the mixed sample is positive, then you test each person individually and spend \(3\) tests. (This is not strictly necessary, since if the group tests positive and the first person tests negative, then you know the second person is positive. But it simplifies the math, so we’ll assume that we test each person if the group tested positive.)

- If you do not group samples, you use \(2\) tests.
- If you group samples, you use \(1 + 2 \left [ 1 - (1-p)^2 \right]\) tests on average.

The figure below shows the expected number of tests \(T\) used, as a function of the probability of a positive test \(p\), for both strategies listed above. You should group the tests if \(1 + 2 \left [ 1 - (1-p)^2 \right] < 2\). If we solve for \(p\), we find that you should group samples if \(p < 1 - 1 / \sqrt{2} \approx 0.293\).

**Comment.** Do not actually group samples in practice. Standard testing kits are not meant to be used on mixed samples. However, many countries have actually used group testing schemes for large-scale COVID-19 testing, so this problem is not fictitious. See the Wikipedia entry List of countries implementing pool testing strategy against COVID-19 for more information.

## Grouping \(n\) samples

Let’s generalize the result above. Assume that your household has \(n\) members.

- If you do not group samples, you use \(n\) tests.
- If you group samples, you use \(1 + n \left [ 1 - (1-p)^n \right]\) tests on average.

You should group the tests if \(1 + n \left [ 1 - (1-p)^n \right] < n\). Solving for \(p\), you should group samples if

The figure below shows whether or not you should group samples. Pick a value of \(n\) and \(p\), then look at the figure. If your point \((n, p)\) falls below the line, you should group the tests.

Let’s examine the function \(f(n) = 1 - \left ( 1 - \frac{n-1}{n} \right)^{1/n}\) plotted above. We can write the function as \(f(n) = 1 - \exp \left( \frac{1}{n} \ln \left( 1 - \frac{n-1}{n} \right) \right)\) and differentiate it using the rule \(\exp(f(x))' = f'(x) \exp(f(x))\). The result is

which has a root at \(n=e\), and \(f'(n) < 0\) for all \(n > e\).

We learn two things from drawing the graph and inspecting it:

- When \(p > f(e) \approx 0.308\), there is no point in grouping tests.
- For all \(n \geq 3\), if we test individually at \(n\), then we should also test individually at \(n+1\).

## A recursive equation

So far we assumed that we either test all \(n\) people individually, or we group all the samples. If we group the samples and get a negative test, we’re done. If we get a positive test, we test each individual in the group.

But thesee are only the extremes: we group all samples or none. We could also use small groups.

As an example, consider the case \(n=6\). We could:

- Perform \(6\) individual tests.
- Do a grouped test, then individual tests if it’s positive.
- Perform a two-way split of \(6\) into \((1, 5), (2, 4)\) or \((3, 3)\) and deal with those sub-groups.

Let \(T(n)\) be the expected number of tests for a group of size \(n\), given a probability \(p\). The three options listed above can be put into a recursive equation:

This equation can be solved using dynamic programming in \(\mathcal{O}(n)\) time. On \(n=100\) people, dramatic savings are possible for low values of \(p\). The figure below shows \(T(n=100)\) for various values of \(p\).

When \(p=0.02\), the recursive equation partitions the \(n=100\) tests as:

- \(9\) groups each containing \(8\) tests (\(9 \times 8 = 72\) tests in total)
- \(4\) groups each containing \(7\) tests (\(4 \times 7 = 28\) tests in total)

Tests in each of these groups are mixed. If the mixed tests give a negative result, we are done with the group. If not, each test in the group is tested individually. The expected number of tests is \(T(n=100) = 27.438\).

## Summary and references

The idea of group testing is attributed to Dorfman, who in 1943 published a paper titled “*The detection of defective members of large populations*”.
The methods have since been refined and used for blood screening for HIV, hepatitis and other infectious diseases, as well as quality control in product testing, drug discovery, DNA screening and experimental physics.

This article merely scratches the surface. More results and extensions are available in the literature. How can we do better than what we sketched in this article? For instance, what if the value of \(p\) depends on each person? What if the tests are not perfect, meaning that there is a chance of a false positive or false negative test result? These are some of the questions that are answered in the literature.

More information can be found in:

- Dorfman, R. (1943). “
*The detection of defective members of large populations*”. The Annals of Mathematical Statistics 14, 436–440. - Malinovsky, Yaakov & Albert, Paul. (2016). “
*Revisiting Nested Group Testing Procedures: New Results, Comparisons, and Robustness*”. The American Statistician. 73. - Malinovsky, Yaakov. (2019) “
*Sterrett Procedure for the Generalized Group Testing Problem*”. Methodology and Computing in Applied Probability. 21. - The book “
*Combinatorial Group Testing and Its Applications*” by Ding-Zhu Du and Frank Kwang-Ming Hwang