Hi fellows!

In this two-part article, I would like to focus on a common problem in statistics - multiple comparisons.

In the first part, we will dive into the main terminology of this problem and the most common solutions. In the second part, we will explore practical implementation with Python code and interpret the results.

I will use metaphors to aid immersion in the topic and make it more fun.

Let's get started! 😎

The Multiple Comparisons Problem: A Nutshell

Imagine that you come to the party where everyone is wearing masks on their face and you are trying to guess if there is a celebrity behind a mask. The more assumptions you make the more likely you are to make a mistake at least once (hello, Type I errors!). This is the difficulty of the multiple comparisons problem in statistics: for every hypothesis you test, another pops up, increasing your chances of being wrong.

Essential Jargon for the Party

As we have already discussed, The Benjamin-Hochberg correction is like a more risky guy who allows you to confidently identify celebrities without being too strict.

This method adjusts the significance levels based on the rank of each p-value, controlling FDR. This approach allows more flexibility compared to the Bonferroni correction.

The Process:

  1. Rank P-values: From the smallest to the largest.

  2. Adjust Significance Levels: For each hypothesis, it calculates a different threshold, which becomes more lenient for hypotheses with smaller p-values. This is based on their rank and the total number of tests (more details can be found in the next part of this article)

So, by focusing on controlling FDR, the Benjamin-Hochberg correction allows you to find more celebrities among all the guests at the party. This approach is particularly useful when you variety of hypotheses and agree on some level of making mistakes in order not to miss out on important findings.

In summary, the Benjamin-Hochberg correction offers a practical balance between discovering true effects and controlling the rate of false positives

In conclusion, we discussed the main terminology of multiple comparison problem and the most common ways to deal with them. In the next part, I will focus on a practice interpretation with Python code.

See you!