Did you know that the number of epidemiologists grew exponentially after COVID started? Kidding. If you were a frequent social media user over the past two years, then you likely encountered an overwhelming number of people analyzing and interpreting COVID-related data.

While I admire the renewed interest in data science among society, it is unfortunate that knowledge of statistics and probability was often left at the door once these discussions ensued. One of the most dangerous consequences of the democratization of data is the misuse and misunderstanding of the limitations of data.

While the use (and misuse) of data is growing in society, the spread of data literacy is growing much slower. Even though I have a strong background in statistics, I have been relatively quiet about COVID-related data on social media.

Why do you think that is? In order to draw conclusions about COVID-related data, you need to consider so many different questions. The study of statistics requires understanding uncertainty and how to draw inferences beyond what the data tells you. Below is the minimum set of questions that would get you close to a statistically sound conclusion regarding COVID data.

Whenever you read your friends', doctors', family members,’ data scientists,' or news stations' conclusions based on COVID-related data, do you think they took into consideration all of the questions listed above? Do you think they considered even two of these questions? Given enough time and the right data set, I could answer some of these questions.

However, I would never be able to answer all of these questions without the assistance of an actual epidemiologist and other statisticians to verify my findings. Regardless of how much analysis you do, your conclusions will never be 100 percent settled.

Uncertainty surrounding data quality, bias, and variance will always be present no matter the methodology you choose. This uncertainty justifies the need to incorporate qualitative thinking coupled with probabilistic thinking for any analysis or viewpoint to be trustworthy.

It comes as no surprise that some of the best wisdom I have heard about the virus has come from Nassim Taleb, an expert on uncertainty*. Next time you analyze data, I would caution you against being certain of your conclusion and swinging too far to the quantitative side of the pendulum. This becomes even more critical when the level of data quality is in question.

~ The Data Generalist

*I believe Taleb is saying that if vaccines were more dangerous than the virus, then there would have been millions of issues popping up. This would be the left side of the tail when looking at a probability distribution that describes all possible outcomes for taking the vaccine. Therefore, the vaccine is less risky than the virus.

Image Sources: Images by Author

Also Published Here