and what the alternatives are.

Photo by Michal Matlon on Unsplash

Creating a machine learning model is an iterative process. You will need to do several iterations to have a robust and decent model. Furthermore, you may need to update a model after it is deployed into production.

A significant part in this process is model evaluation. It is just as important as creating the model.

There are several loss functions and metrics to evaluate model performance. Which one to use depends on both the task and data. When it comes to classification models, the simplest and obvious choice seems to be the classification accuracy. However, it is not the optimal choice in some cases.

In this article, I will explain when you should avoid using the classification accuracy and which alternatives exist.

Classification accuracy might be the most intuitive metric. It shows the ratio of correct predictions to all predictions.

Classification accuracy (image by author)

So what is wrong with this fundamental metric?

The potential problems are best explained via examples. Think about spam emails for a second. We do not have to dwell on them thanks to the efficient spam detection algorithms.

Consider we are designing a model to detect spam emails. An ordinary email address receives very few spam emails compared to the other emails. Thus, the dataset used for training is likely to be unbalanced.

Let’s say the ratio of spam emails and regular ones in the dataset is 5 to 95. If a model predicts every email as not spam, it will have an accuracy of 95% which actually sounds good. However, it is a model that does nothing.

Furthermore, mistakes on spam and other emails should be handled differently. It is not much of a problem to miss a spam email and let it go to the inbox. However, it could have severe consequences to mark an important email as spam.

The classification accuracy does not provide us the flexibility that we need for differentiating mistakes on spam and other emails.

A similar case can be a model used for classifying tumors as malignant or benign. It is literally a vital mistake to detect a malignant tumor as benign. Our model should focus more on correctly detecting the malignant tumors. The evaluation metric should also be set accordingly.

Continue reading:—-7f60cf5620c9—4