By Sandeep Uttamchandani, Ph.D., Both a Product/Software Builder (VP of Engg) & Leader in operating enterprise-wide Data/AI initiatives (CDO)

Image by Tumisu from Pixabay

ML model training is the most time-consuming and resource-expensive part of the overall model-building journey. Training by definition is iterative, but somewhere during the iterations, mistakes seep into the mix. In this article, I share the ten deadly sins during ML model training — these are the most common as well as the easiest to overlook.

Ten Deadly Sins of ML Model Training

1. Blindly increasing the number of epochs when the model is not converging

During model training, there are scenarios when the loss-epoch graph keeps bouncing around and does not seem to converge irrespective of the number of epochs. There is no silver bullet as there are multiple root causes to investigate — bad training examples, missing truths, changing data distributions, too high a learning rate. The most common one I have seen is bad training examples related to a combination of anomalous data and incorrect labels.

2. Not shuffling the training dataset

Sometimes there are scenarios where the model seems to be converging, but suddenly the loss value increases significantly, i.e., loss value reduces and then increases significantly with epochs. There are multiple reasons for this kind of exploding loss. The most common one I have seen is outliers in the data that are not evenly distributed/shuffled in the data. Shuffling, in general, is an important step including for patterns where the loss is showing a repeating step function behavior.

3. In multiclass classification, not prioritizing specific per-class metrics accuracy

For multiclass prediction problems, instead of tracking just the overall classification accuracy, it is often useful to prioritize the accuracy of specific classes and iteratively work on improving the model class by class. For instance, in classifying different forms of fraudulent transactions, focus on increasing the recall of specific classes (such as foreign transactions) based on business needs.

4. Assuming specificity will lead to lower model accuracy

Instead of building a generic model, imagine building a model for a specific geographic region or specific user persona. Specificity will make the data more sparse but can lead to better accuracy for those specific problems. It is important to explore the specificity and sparsity trade-off…

Continue reading: