Whether it is in computer vision, natural language processing or image generation, deep neural networks yield the state of the art. However, their cost in term of computational power, memory or energy consumption can be prohibitive, making some of them downright unaffordable for most limited hardware. Yet, many domains would benefit from neural networks, hence the need to reduce their cost while maintaining their performance.
That is the whole point of neural networks compression. This field counts multiple families of methods, such as quantization , factorization , distillation  or, and this will be the focus of this post, pruning.
Neural network pruning is a method that revolves around the intuitive idea of removing superfluous parts of a network that performs well but costs a lot of resources. Indeed, even though large neural networks have proven countless times how well they could learn, it turns out that not all of their parts are still useful after the training process is over. The idea is to eliminate these parts without impacting the network’s performance.
Unfortunately, the dozens, if not hundreds, of papers published each year are revealing the hidden complexity of a supposedly straight-forward idea. Indeed, a quick overview of the literature yields countless ways of identifying said useless parts or removing them before, during or after training; it even turns out that not all kinds of pruning actually allow for accelerating neural networks, which is supposed to be the whole point.
The goal of this post is to provide a solid foundation to tackle the intimidatingly wild literature around neural network pruning. We will review successively three questions that seem to be at the core of the whole domain: “What kind of part should I prune?”, “How to tell which parts can be pruned?” and “How to prune parts without harming the network?”. To sum it up, we will detail pruning structures, pruning criteria and pruning methods.
When talking about the cost of neural networks, the count of parameters is surely one of the most widely used metrics, along with FLOPS (floating-point operations per second). It is indeed intimidating to see networks displaying astronomical amounts of weights (up to billions for some), often correlated with stellar performance. Therefore, it is quite intuitive to aim at reducing directly this count by removing parameters themselves. Actually, pruning connections is one of…
Continue reading: https://towardsdatascience.com/neural-network-pruning-101-af816aaea61?source=rss—-7f60cf5620c9—4