COVID testing effects positivity rates

In the circle of family and friends, there has been a lot of discussion about India's poor COVID management. Although I do not disagree with them, the situation could have been managed in a better fashion given the late onset of COVID relative to the rest of the world.

But the point I disagree with them is on the relative performance of India. I have heard that the great USA is improving, even neighbouring countries like Pakistan has improved, then why not India. COVID seems to have been wiped out from the rest of the world but why India keeps on struggling from such a long time.

The charts resorted to for comparison purposes is the daily cases, sometimes positivity rates and more. The charts (shown below) obtained from (www.ourworldindata.com) also shows the improving status of the world and worsening Indian situations. Great news for government's arm-chair critics, they got few more cannons in their tanks!





However, what these data show are only the front-end big breaking headlines, whereas the backend is controlled by the variable, that is, the number of tests or tests per thousand people. So, let's look at the test per thousand data for these countries.



Some interesting things pop-up upon close inspection of this chart. Firstly, since late July, the United States testing has almost stagnated. In fact it follows a slightly declining trend. Perhaps upcoming elections might be driving this. But then this should raise serious doubts over cases data of the USA. Next, our dear neighbour Pakistan. Its quick revival was discussed widely. But the testing data of Pakistan again raises suspicion. Out of these four countries, I believe, the United Kingdom has been the most transparent. Their testing raised exponentially and therefore their positivity rates. Owing to the smaller population and scale of the graph, the peak in daily cases graph seems minute. However, Pakistan, having more than thrice the population of the UK but similar peak raises doubts. Some researches have used positivity rates to comment on the real scenario since it has been understood in some circle's (not all) that daily-cases data have its limitations. But I believe testing is the major parameter out of a diverse set of available variables. 

To get an idea behind this I used python (covid-positivity-rates.py) and created a simple static setting. The methodology adopted is given below.

1. I constructed a population sample (list of size 1 million and 100k) with each element representing an individual.

2. Each individual can either be healthy (H), asymptomatically infected (A) or symptomatically infected (I). The probability was assumed to be 1/3 for being in any of these states. The total number of H, A and I individuals are recorded. 

3. Positivity rate for the population, that is, I/population-size, is calculated

4. Now the testing sample is created out of the population. For this, a specific number of individuals (equal to testing sample size) are randomly selected from the population. The total number of H, A and I individuals found in the test sample are recorded. 

5. Positivity rate for the test sample, that is, I/testing-sample-size, is calculated

6. To understand the effect of testing rate on positivity rates, different testing per thousand rates were considered - varying from 0.1 to 100. Further to estimate the deviation of positivity rates between population and tests, RMSE was calculated.

7. Steps from 1 to 6 were repeated for 20 times in order to iterate and check the consistency of findings

The results are given in this file. 

It can be observed clearly that higher testing-rates (testing per thousand individuals), irrespective of population-size reduces RMSE, that is, higher testing rates represent the reality. At lower testing rates the positivity rates obtained on the basis of testing are quite biased (look at the evolution of y-axis span as testing rates change). These distortions suggest further in-correctness in many other related variables like tests conducted per case, daily cases etc. Many other variables therefore do not remain trust-worthy for proxying the impact of COVID on the real-population. 

It is also interesting to see the difference between the number of infected individuals detected in the test vs the number of infected individuals in the population. Even with per-thousand test rates reaching 100, the testing data under-reports infected individuals by a factor of 10.

Although there are several limitations of this analysis what this tells us is that mere looking at cases headlines on news might not be in our own interests. Even if the channels show a decline in daily-cases, take it with a pinch of salt. There's a lot of politics which might be driving those figures. The best way to save ourselves and our loved one's is to follow the suggested set of instructions and protocols by governments and health agencies.

The reported daily cases are like speed of a moving car. We, sitting in front of the idiot-box, reading headlines only see the speed with which this car is moving. But forget not that this car is not being driven by the invisible hand. There's a driver behind those tinted windows with a foot on the gas-pedal. 


Comments