Coronavirus: The Media Says "Shoot The Messenger!"

I am a new subscriber, but have followed Chris for many years. First, I would like to say thank you Chris, for your straightforward, informed, commenting.
Today, Japan’s number of cases has jumped higher to 45 cases, a 105% increase, as follows:
1/31 20, 2/1 20, 2/2 20, 2/3 20, 2/4 22, 2/5 45
Some of this is probably due to lax posting by the Japanese. But its interesting to note China had 45 cases reported on January 16, 2019, which increased in three weeks to todays reported total of 28,254. Let’s hope Japan has remediation steps in place, and will not follow China’s exponential increase in cases. Time will tell.
 

First, I am a new subscriber to Peak Prosperity, but have followed Chris for many years. I would like to say thank you Chris for your thorough and perceptive commentary.
Second, today Japan posted an increase in cases to 45 from 22 yesterday, a 105% increase. Probably, the sudden increase is due to correction of lax reporting:
1/31 20, 2/1 20, 2/2 20, 2/3 20, 2/4 22, 2/5 45
I noticed China reported 45 cases on January 16, 2020 which in three weeks has grown to 28,254. We’ll see if Japan’s authorities have put enough remedial programs in place to not follow China’s exponential increase in cases.
 

Gilead’s drug is a nucleoside analogue is it is mistaken by the virus for a DNA/rna base it’s not a protease inhibitor

The Daily Telegraph has an article where the realization of exactly how bad this might get is beginning to dawn on them. Thanks to Chris and Adam for all the on going hard work. There are few places you can trust to give you accurate, up to date information and I for one, treasure this one and the great community it has fostered.
https://www.telegraph.co.uk/business/2020/02/05/chinas-coronavirus-not-remotely-control-world-economy-mounting2/

Among the food and stuff you buy in preparation, make room for a home humidifier. From what I’ve read, a room with high humidity acts to cut down on airborne infectious droplets. Not sure why.

Pre surgery , I knew I was going to be spending weeks at home recovering, so stocked up, got needed house maintenance and chores done, etc. How many people are prepared physically, psychologically, emotionally to isolate? Some just coming home after a trip, maybe don’t feel too bad and Holy Cow!! I’m out of ice cream, doritos, bread, milk, toilet paper, etc. You think they might call a friend or just make a quick dash to the store? Couldn’t hurt, right?
 

KPIX CBS San Francisco Bay Area video: https://youtu.be/YAH9TbeADMI
More people wearing masks, encouraging and taking other precautions; reassurances amid declining business activity due to nCoV concerns

Think of all the things, as you travel around in your day, that you touch, that have been handled by others. I live in a rural, not too bright community, where scabies seems to be endemic. As in many third world countries, people here just seem to live with it and public health people won’t acknowledge it. Pretty sad in 21 century America, but I have learned to adjust. I carry my own pen for signing anything. Community pens are at the bank, grocery and health club. I wipe down grocery cart handles.
I often think what a great study could be done if the stockers, cashiers and baggers at the grocery store could have their hands swabbed and plated out. Can’t pump gas in my state so gas person handles my credit card.
I also have stopped shaking hands, got infected too many times, even fist bumps becoming questionable. I guess gloves is the next reasonable step. Gloves and wipes, steering wheel, door handles, keys, going to slow things down.
Just assume everyone around you is unaware, asleep or stupid and protect yourself accordingly.

The numbers get all jumbled up in the media, but on average the situation is clear. Samples have been taken from 273 people having had most contact with the first known infected person that disembarked in Hong Kong, 102 of those have been processed until now, and from that processed batch, 20 came back positive.
In my opinion, we can expect at the very least 100 cases to be on the ship at the moment, and most likely increasing…

Just quick observation.
As I do not neglect impact of ACE2 receptors, I think we need much more time to see if the claim of greater susceptibility of Asian people is valid.
For today - all around the world (except China) we have something like 250 cases.
Now, lets have a look on China data (cases/deaths/recovered):
Hubei 19665 549 671
Guangdong 970 0 52
Zhejiang 954 0 82
Henan 851 2 59
Hunan 711 0 67
Jiangxi 600 0 37
Anhui 591 0 34

Would we really claim on the basis of above data that people in Guangdong , Zhejiang , Henan are less susceptible?
I guess what is most important (for now) - quality /accessibility of the treatment. This is why single cases in the western world are ok - they have best treatment one can get (till the time when hundreds start visiting hospitals).

Just wanted to point out an interesting titbit in relation to one of the charts, that Chris shows in the video.
Disclaimer: I’m a statistician by profession, and although this post supports an argument that the official 2019-nCov numbers are doctored, the argument is in itself far too weak to support this conclusion. This argument is only meant as an interesting note, that can support other arguments with a method of the “doctoring”
The chart I’m referring to is this:
 

This chart is being presented, as a formula, that can predict the new official number, pointing to the conclusion, that the numbers might not be “entirely correct”, but as far as I can see, nobody has mentioned the obvious.
The formula for the function is QUADRATIC. Cases=155.57d2 – 2610.7d + 13789
Why is this interesting? Well the theoretical model for the spread of a virus is not quadratic. It is exponential. The reason why this is interesting requires a bit of math. The differential of a function describes the functions increase over time. Like speed and acceleration, where speed is the total number of cases, and the acceleration is like increase in cases pr. day.
The differential function of a quadratic function is linear. Increase in cases=311.14d - 2610.7
The differential function of an exponential function is exponential.
This means that over time, the quadratic function will always undershoot the exponential function. If you wanted to “doctor” the numbers to look kind of exponential, but continually undershoot the real number of cases, you could use a quadratic function.
Obviously this might just be a coincidence, but I have modelled the numbers too corona data, and I noticed, that a quadratic fit of the data becomes increasingly better and better over time. I have included an Excel file with the data from the Johns Hopkins dashboard: https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6 and the wuflu dashboard: https://wuflu.live/

Both data series has been fitted with a quadratic trendline and an exponential trendline, and from the charts, it is visually obvious that the quadratic form models the data best. The statistical representation of how well the trendlines fit the data can be read in the R square values (R2). The R2 value is a number between 0 and 1 and basically tells how many percent of the variance in the datapoints can be explained by the trendline. So, looking at the JH data, the exponential trendline can explain 96,3% of the variance but the quadratic trendline can explain 99,91% of the variance. We see exactly the same pattern in the Wuflu data where the exponential trendline can explain 96,79% of the variance but the quadratic trendline can explain 99,95% of the variance.
This means that for both datasets, the quadratic form fits the data best, except that viral spread should theoretically be exponential.
Obviously, this could just be due to “noise” in the data, but visually this does not seem to be the case. All datapoints follow the quadratic trendline much more closely, except for datapoint 31/1, that undershoots a tiny bit in the JH data. Just to drive home the point, we can adjust the graphs to only include the early datapoints until the 26/1.
Now for the JH data, the exponential trendline can explain 96,65% of the variance and the quadratic trendline can explain 97,74% of the variance. And for the Wuflu data the exponential trendline can explain 99,51% of the variance but the quadratic trendline can explain 99,96% of the variance.
This clearly shows that it is not noise, that produces the better fit of the quadratic form, it is time and the inclusion of more datapoints. Basically this means, that in the beginning, both a quadratic and an exponential model were plausible models for the data, but as we move forward in time, the quadratic form is becoming more likely as a representation of the data for both datasets.
This points to the obvious conclusion, that since a viral outbreak should be exponential in nature, all else being equal, and the best fit of the trendlines is moving towards a quadratic fit, the likelihood that the data is being “doctored” is increasing.
The counter argument to this is of course that with such high R2 values, and so few datapoints, no statistically significant tests can be made that this is actually true.
But everything else about this case considered, can you see a pattern emerging?

Max S.,
Thanks for this deep analysis :). It may be funny to “play” with this model in Excel (for people inclined to numbers) trying different trendlines and to keep an wye on it once more data is available.
This is realy good remark that quadratic function should not reflect exponential reality. And R2 looks too good. From the other side (playing devils advocate), could it be that all actions from Chinese government impede spreading and it is now more quadratic?
Now a bit of “conspiracy theory” - what struck me when I look on data again:
Hubei 19665 549 671
Guangdong 970 0 52
Zhejiang 954 0 82
Henan 851 2 59
Hunan 711 0 67
Jiangxi 600 0 37
Anhui 591 0 34
Could it be that CCP decided to “sacrifice” Hubei and throw some light on reality there and from the other side they are hiding data from other provinces? So all the world focuses on Hubei and just bypasess rest? In the end 3 of provinces have almost 1000 cases each. When Hubei had 1000 cases, hell already broke loose there…
 

Hi Max S., welcome to the PP tribe! Thank you for delving deeper into the data and helping us to better understand it through your excellent presentation and analysis. I’m sure you’ll find some kindred stats-oriented folks here for some lively and productive conversation. :slight_smile:

mad_bobul
To answer your question bluntly (perhaps too bluntly):

  1. Lower R2 due to noise in the data points to: delayed data, under reporting, problems with testing and diagnosing, etc.
  2. Lower R2 due to fit to wrong model points to: Active manipulation of data or fictitious data.
Perhaps that is a bit too blunt, but with an R2 of 0,9991 to 0,9995 fitted to the wrong model, I’d go for option 2. To elaborate a bit: Option 1 means that the real number of cases could be higher. Option 2 makes anything possible, like this for example: https://www.zerohedge.com/health/did-chinas-tencent-accidentally-leak-true-terrifying-coronavirus-statistics

Max - Your analysis is excellent and fits right in here perfectly. Honest, measured, and experienced while being open about the interpretation.
The exponential vs quadratic insight is a new one for me, and completely obvious now that you’ve raised it.
Thank you for the analysis and resulting insight.
As an aside, the viral cruise ships are confirming (for me) the idea that nCoV is indeed highly transmissible. At a minimum, one person infected 20 others. But only 102 results are back from 273 tested. So that could be much higher (and I’d bet that’s the case).
Such a rate of spread is not consistent with a quadratic spread, but an exponential spread.
Great work.
 

Hi Chris – Thank you very much.
Just to add some more familiar terms, that you used in the Crash course series. The R2 in a regression model equals the correlation between the fit and the data, squared.
So, if we want to know the correlation between the fitted trendline and the data we take the square root of The R2. The correlation between the data and the two quadratic functions is 0,9955 (Wuflu) and 0,9997 (JH).
Interpretation of a correlation coefficient is obviously different from situation, but if we for a moment disregard that the quadratic model is the wrong model, bud add the fact that there should be some “noise” from all kinds of practical problems, we would expect a lower correlation.
In other words, as you point out in crash course series (regarding a correlation), the correlation is just too good in these numbers.
And when we consider that this “near” perfect fit is to a wrong model, I’m sure everyone is getting a whiff of what these numbers smell like.

Quadratic: the number of new cases each day increases linearly. For example if there are 100 new cases on day one, 120 on day two, 140 on day three, 160 on day four, etc. I can guess day five will have 180 new cases.
Exponential: The number of new cases is equal to the number of new cases yesterday multiplied by X where X depends on the doubling time. For a doubling time of 1 day, X=2.
The number of new cases obviously increases much quicker for the exponential model. The exponential model also fits the theory that the number of new cases is proportional to the number of infected people who can transmit the disease.
Theory: a quadratic fit is either someone making things up as we’ve already discussed - or - reflects a gradual increase in the availability of test kits which are the limiting factor on diagnosis. Here is the number of new cases daily from WuFlu.live:

For a perfect quadratic, it would be a line with constant slope. It’s close, but not quite. Noisy enough to be real or too perfect and therefore made up? I can’t say.

We are hearing about results of infection on a cruise ship in Japan. But, that’s only one ship. How many cruise ships are active on any given day? Should we be hearing results of other cruise ships, at least even if those results are negative?

After looking more at the data, I found some more indications, that the data doesn’t look like it is supposed to.
Regardless of whether we assume a quadratic or exponential relationship, both functions has an increasing acceleration. From this we would naturally assume that the error residual or “noise” in the data should increase likewise. By error residual or noise I simply mean the difference between a data point and the trend line. The reason for this is that any practical problems with data collection like lack of test kits, missed reporting, etc. should also increase, because the numbers gets bigger. Therefore the absolute values of the error residuals should increase over time. If we try to chart this for the two quadratic relationships, we see that this is not the case, which is rather curious.

Instead we see data, that looks rather constant. To make the data even more clear, we can divide the residuals by the trend line values. If we do this we should see a reasonably constant line, with a value of the average ratio of noise to value. If the noise is 10%, we should see a line oscillating around 0,1.

Instead we see that noise is getting less and less and really approaching 0. This is because the error residual is constant, but the actual values are growing at a quadratic rate.
This makes absolutely no sense. If the difficulty in getting test kits was bad at 5000 cases, it should be alarming at 25000. If the job of counting and reporting the cases was bad at 5000 cases, it should much worse at 25000 cases. If the misreporting was at a certain level at 500 cases, it should be 5 times that at 25000 cases.
So, this again raises the question of whether we can trust the official data, because the error values simply are completely contrary to what we would expect.

Just heard a great quote!
 
”The only people who get mad at you for speaking the truth are those who are living a lie.”