This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 4HJSabtR-_oQPYulX9WIjXSAwP41FNBsNuwTJwzEx0g
Cover

AI Metrics Made Simple: Precision, Recall, F-Score, and ROC-AUC

Written by @alextroitskyai | Published on 2025/12/3

TL;DR
Precision and Recall show how well a model identifies positive cases; F1 combines them into one score; ROC-AUC measures overall ranking quality without relying on a threshold.

In this article, I’ll explain how Precision, Recall, F-score, and ROC-AUC are calculated, with real examples.

We’ve already figured out that many model-quality metrics are derived from the Confusion Matrix, while Accuracy is used rather rarely (even though it’s very popular among clients who don’t really understand machine learning). So, what else can we calculate based on this matrix?

Confusion matrix from with the example about the elephant

Precision & Recall

Precision = TP / (TP + FP) 

Precision shows the share of objects the model marked as positive that are actually positive.

Recall = TP / (TP + FN) 

Recall shows what portion of all true positive-class objects the algorithm managed to find.

These metrics reflect model quality more accurately than Accuracy, but they also have some drawbacks. For example, you can change both of them simply by shifting the probability threshold for class assignment.

F-Score

Sometimes, all these metrics can get confusing, and machine-learning algorithms are usually built so they optimize just one specific quality metric. That’s why people came up with metrics that reflect both Precision and Recall at the same time — these are called F-scores. There are several types of F-scores, but here I’ll explain the most basic one, the F1 score:

F1 = 2 * (Precision * Recall) / (Precision + Recall)

So, F1 shows both how well the model finds positive-class objects among all true positives, and what share of the objects the algorithm labeled as positive are actually positive.

ROC-AUC

Probably the most popular metric for binary classification is ROC-AUC. It doesn’t depend on choosing a probability threshold and works well on imbalanced datasets. Let’s break down how it’s computed step by step using a simple example about buying an elephant.

Say we have 10 people; 4 of them will buy an elephant, and 6 will not. Our model predicts the probability of buying an elephant for each person.

Probability and real outcome of model

We now have each person’s buying probability and the actual outcome.

By counting the number of positive outcomes (there are 4) and negative ones (6), we draw a grid that is 4 cells high and 6 cells wide. Its total area is 24.

The table we get

Next, we sort the probabilities from highest to lowest. Then we move through them one by one: if the person actually bought the elephant, we take one step up; if not, we take one step right. After finishing the actual sequence, we continue placing the remaining steps to the right until the grid is filled. Here’s what happens:

How we shall fill in the ROC-AUC table

First step: Upward — the model predicted correctly (low probability of buying an elephant, and the guy actually didn’t buy it).

ROC-AUC table after the first step

Second step: Upward again — another correct prediction (again, low probability of buying an elephant, and the guy actually didn’t buy it).

ROC-AUC table after the second step

Third step: This time the model was wrong — we moved right (we have a low probability of purchase, but the man actually bought an elephant).

ROC-AUC table after the third step

Fourth step: correct — we move up.

ROC-AUC table after the fourth step

Following this logic, we fill the entire grid.

ROC-AUC table after all the steps

After placing all steps, we fill all cells below with 1.

All cells below filled with 1

Then we count them: we get 19.

Now, we divide 19 by the total number of cells (4 * 6 = 24):

ROC-AUC = 19 / 24 = 79%

Conclusion

In the next article, I’ll continue with AI metrics and move on to regression metrics such as MSE, RMSE, MAR, R-squared, MAPE, and SMAPE.

Follow me — check my profile for links!

[story continues]


Written by
@alextroitskyai
Now: entrepreneur (codesi.ai) Past: analyst, data scientist, management consultant

Topics and
tags
ai|precision-and-recall|f1-score|roc-auc|binary-classification-metrics|ai-metrics|how-is-an-f-score-calculated|how-is-precision-calculated
This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 4HJSabtR-_oQPYulX9WIjXSAwP41FNBsNuwTJwzEx0g