Swarajya Logo

Ideas

How Many People Should Be Tested To Estimate The Prevalence Of Covid-19?  

  • One of the dominant criticisms of India’s strategy to cope with the Covid-19 pandemic is that we are not testing enough. But is widespread testing really the panacea for the problem? We investigate.

M. Vidyasagar and Sunil D. SherlekarApr 21, 2020, 01:47 PM | Updated 01:47 PM IST
Covid-19 testing 

Covid-19 testing 


At the moment, the COVID-19 pandemic is occupying the minds of everyone, from policymakers to ordinary citizens alike.

Policymakers are concerned with questions such as: How many Indians are infected at present? How will this number progress over time? And, perhaps, most importantly, what quantum of critical-care equipment is required and when?

An earlier popular article published in this magazine and written by Santosh Ansumali and Aloke Kumar addresses the last set of questions. A more precise version of their reasoning, aimed at a scientific audience, can be found here.

As per the ICMR website, as of 18 April, a total of 372,123 samples from 354,969 individuals were tested, out of whom 16,365 individuals have been found to be COVID-19-positive. This works out to about 4.6 per cent of the tested patients, and testing has been limited to “suspected cases and contacts of known positive cases.”

One of the dominant criticisms of India’s strategy to cope with the pandemic is that we are not testing enough. So, the question naturally arises: Is India conducting enough tests? One of the arguments that keeps being made is that, in terms of “tests per million population”, India has a very low figure.

There is a simple counter to this: While it may be easy to hide patients, it is impossible to hide deaths in an open society such as India (China, of course, has been thoroughly opaque about the number of deaths as well).

So, if we were to test more patients, the fraction of positive tests would come down, so as to keep the number of deaths near the currently reported number.

Until now, those tested in India have been limited to specific target groups. If we wish to estimate the fraction of infected persons in the population at large, is “tests per million population” a valid criterion to assess the accuracy of the estimate?

The main purpose of this article is to debunk this argument. A one-sentence summary of our article is this: In order to estimate a ratio (e.g. the ratio of infected persons to the total population), the number of samples which need to be tested does not depend on the size of the underlying population.

This phenomenon is well-known to statisticians. Yet, people keep pushing this ridiculous idea of “tests per million,” — either out of ignorance or out of malice.

We cannot do anything about the malicious ones, but we hope to educate those who wish to know how sampling works.

Suppose someone gives us a two-sided coin. We wish to know whether it is a “fair” coin in that the probability of getting heads is 0.5 (or 50 per cent). So, we toss the coin a few times and observe the outcomes.

Suppose we toss 10 times and observe 4 heads. That ratio 4/10 = 0.4 is known as the empirical probability of getting heads.

It should not be confused with the true (but unknown) probability of heads.

So, one can turn the question around and ask: If indeed the coin is fair (probability of heads = 0.5), what is the likelihood that out of 10 tosses, we would get only 4 or fewer heads?

It is easy to compute the answer using the well-known binomial distribution. We compute the likelihood of 0, 1, 2, 3, and 4 heads, and add them up.

The answer turns out to be 0.377, or 37.7 per cent. Now suppose we toss the coin 100 times and get 40 heads. Note that the empirical probability is the same as before, namely 0.4.

But now the likelihood of a fair coin producing only 40 or fewer heads out of 100 tosses is 0.0284, or 2.84 per cent. So, we can be 1 – 0.284 = 0.976 (or 97.6 per cent) sure that the coin is not fair.

That is how statistics works: One can never say that some statement is absolutely true — only that it is true with a certain confidence level.

In most applications involving the life sciences, the threshold of 95 per cent (or 0.95) is widely accepted as a universal threshold for accepting a statement as being true.

So, in the example above, it would be accepted that the coin is not fair.

What does this have to do with testing for the presence of COVID-19? Obviously, everyone is interested in knowing the answer to the question: What fraction of the Indian population is infected?

Until now, India has been testing individuals on the basis of possible exposure. But let us leave that aside and examine possible approaches to estimating the fraction of the infected in society at large.

Once we have a fairly reliable estimate (however defined) of the fraction, we can just multiply by the base population to estimate the number of infected persons.

This logic can be applied at various levels of granularity: nationwide, state-wide, citywide, et cetera.

In doing this exercise, we must keep in mind that we are trying to estimate a very small number. If the fraction of infection among those at risk is hovering at around 4 per cent to 5 per cent, it would be much lower in the public at large, perhaps 0.001 or even lower.

Therefore, the quality of the estimate must be judged on the basis of the relative error, and not absolute error.

To illustrate, suppose our estimate is that 0.001 (1 out of 1,000) persons is infected, whereas the true rate is 0.005 (5 out of 1,000).

We cannot escape by saying “Well, I was only off by 0.004.” In reality, we were off by a factor of five, which is simply unacceptable.

With this background, we can ask the following question: How many persons should be tested to ensure that, with 95 per cent confidence, the true infection rate is not more than five times the estimated rate?

That number can be computed using a standard inequality from statistics, known as the “multiplicative Chernoff bound,” which was proved in 1956 by Howard Chernoff.

The answer depends on, once again, the empirically estimated probability of infection. The lower the empirical estimate (call it p), the greater the number of samples required.

Using the multiplicative Chernoff bound, it can be shown that true infection rate is not higher than 5p with 95 per cent confidence, if we take 2.3055/p samples.

So, if we test 2,306 persons, and the fraction of infected persons does not exceed 0.001, then we can be 95 per cent sure that the true fraction of infected persons does not exceed 0.005.

This formula can be easily adjusted for different error levels (the multiple of 5 can be changed to anything else), and confidence levels (95 per cent can become something else).

For example, if we wish to be 99.99 per cent sure and not merely 95 per cent sure, then 2.3055 changes to 6.1897.

But the key point is that the size of the underlying population does not appear anywhere.

This phenomenon is well-known to pollsters. To estimate what fraction of the people would vote for a particular candidate, they use roughly the same sample size in large states as in small states.

The main cautionary note in the above analysis is the assumption that the infection rate is uniform within the population being tested.

The analysis is only as good as the validity of the assumption. So, sampling in HP and extrapolating to MH would be meaningless, as would sampling in rural MH and extrapolating to Mumbai.

The policymakers would have to identify broad clusters of population in which it can be reasonably assumed that the infection rate is uniform.

Within each cluster, the above analysis would be applicable. The delineation of the clusters is a matter of judgement and lies outside the realm of statistics.

However, within each cluster, the number of persons to be tested would again not depend on its population.

So, in short, the next time someone says, “tests per million population,” simply retort “multiplicative Chernoff bound.”

Note: The details of the computations can be obtained from the first author.

Join our WhatsApp channel - no spam, only sharp analysis