Estimating SARS-CoV-2 Infections from Deaths, Confirmed Cases, Tests, and Random Surveys
There are many sources of data giving information about the number of SARS-CoV-2 infections in the population, but all have major drawbacks, including biases and delayed reporting. For example, the number of confirmed cases largely underestimates the number of infections, deaths lag infections substantially, while test positivity rates tend to greatly overestimate prevalence. Representative random prevalence surveys, the only putatively unbiased source, are sparse in time and space, and the results come with a big delay. Reliable estimates of population prevalence are necessary for understanding the spread of the virus and the effects of mitigation strategies. We develop a simple Bayesian framework to estimate viral prevalence by combining the main available data sources. It is based on a discrete-time SIR model with time-varying reproductive parameter. Our model includes likelihood components that incorporate data of deaths due to the virus, confirmed cases, and the number of tests administered on each day. We anchor our inference with data from random sample testing surveys in Indiana and Ohio. We use the results from these two states to calibrate the model on positive test counts and proceed to estimate the infection fatality rate and the number of new infections on each day in each state in the USA. We estimate the extent to which reported COVID cases have underestimated true infection counts, which was large, especially in the first months of the pandemic. We explore the implications of our results for progress towards herd immunity.
READ FULL TEXT