Why has deep learning been so successful? What is the fundamental reason that deep learning can learn from big data? Why cannot traditional ML learn from the large data sets that are now available for different tasks as efficiently as deep learning can? These questions can be answered by understanding the learnability of deep learning — otherwise known as Vapnik-Chervonenkis dimension, illustrated beautifully by the curve in figure 1 [1]. The curve captures the performance of traditional ML and DL VS the amount of data used to train the models.

It can be observed that when data sets are small, the traditional ML has better performance compared to DL, but as the data sets move into the big data zone, the DL’s performance keeps increasing, almost exponentially. This is the reason we are seeing such a significant performance gain for deep learning on specific tasks, where there is large, labeled data sets available for training (image classification is the classic example).