Some covid-19 study thoughts

This is Joseph

This study needs context:
A seroprevalence study led by Stanford researchers estimates that the number of COVID-19 cases in Santa Clara County was 50 to 85 times higher than the number of confirmed cases by early April — meaning that the true case numbers could range from 48,000 to 81,000 people infected. The county has reported 1,870 confirmed cases as of Friday. 
Medicine professor and study co-lead Jay Bhattacharya said in a Friday press conference that the study results put coronavirus’ fatality rate “about on par with the flu,” but he warned that the lack of a vaccine means the two situations aren’t equivalent. 
Out of the 3,330 samples analyzed, 50 came back positive, indicating a crude prevalence rate of 1.5%. The researchers adjusted the initial results both by demographics — to account for the zip code, sex and race of study participants — and by test accuracy. The antibody test misses between 10 and 30% of those who have COVID-19 antibodies, according to Bendavid.  
The problem of course is what is the specificity of the test. The authors estimate it:
A combination of both data sources provides us with a combined sensitivity of 80.3% (95 CI 72.1-87.0%) and a specificity of 99.5% (95 CI 98.3-99.9%)
But 3330 samples would have 17 false positives at the center of the point estimate and 56 at the bottom of the interval (e.g. the entire sample size could be false positives based on this data making the true prevalence ZERO) (and 3 at the top of the interval). This sort of low prevalence population is dangerous for making conclusions.

It also makes no sense. Look at New York as of Sunday:

There are 12.6 cases per thousand and 0.93 deaths per thousand. That is already at a flu level of mortality (suggesting nearly 100% of New Yorkers are in infected, across the entire state). But that is required for this statement:
A hundred deaths out of 48,000-81,000 infections corresponds to an infection fatality rate of 0.12-0.2%.  
A 50 to 85 times under-count would mean 63% to 107% of New Yorker residents are infected. That is for the entire state, not just NYC. If Stanford researchers think this level of under-count is plausible then it should be immediately apparent with a quick NY based study.

So these rare infections require an extremely accurate specificity or else you get huge confidence intervals that make the rest of it difficult to interpret, as the scenarios in the paper don't seem to incorporate the uncertainty in the test specificity. If they do, I am surprised that they end up with confidence intervals so narrow. What we really learn is that the rate is small, and could support a lot of possible infection fatality rates. I don't know that the media quotes above are supported by the analysis in the paper, once variance is considered carefully.

Postscript: After writing this, I realized that I am late to the party, via statchat. The linked articles discuss specificity in a lot more detail, although neither use the number right of the paper.

Postscript 2: Never schedule a post for Monday. Andrew Gelman is awesome here. Go read that instead.

Postscript 3: I think here is where Bayesian perspectives are super helpful. For the NY death numbers to be even close, more than half of the city must have been infected. The Diamond Princess only had 17% infected. They also had an IFR of 0.5% (95% CI: 0.2-1.2%). Applied to NYC (about 9000 deaths), that suggests 1.8 million infections (which is about 20%) with a range of 750,000 (less than 10%) to 4.2 million (50%). NYC demographics are not the same as Santa Clara, but  the median age in NYC is 36.9 years and the median age in Santa Clara is 37.2; these are not wildly different numbers that would make NYC uniquely vulnerable.

  1. Diamond Princess: 712 cases, 14 dead, dozens still presumed sick, older demographic, mostly useful to judge fatality risk of over 60s in above average age-adjusted health.