## Pruning is an important tool to make neural networks more economical. Read on to find out how it works.

One problem of neural networks is their size. The neural networks you see in online tutorials are small enough to run efficiently on your computer, but many neural networks in industry are huge and unwieldy. They often take days to train, and running them sucks up a lot of compute power. This raises the question: is it possible to decrease the size of a neural network, while maintaining accuracy on the test set? It turns out, yes. There is a technique called “pruning” that does exactly that. There is also an idea called the “Lottery Ticket Hypothesis (LTH)” that gives insight into why pruning works. In this article we’ll first look at the pruning algorithm, and then discuss the LTH.

Pruning is a simple, intuitive algorithm. There are many variants, but the basic idea works on any neural network. The idea is this. Inside a large, trained, neural network, there will be some weights with large magnitude and some with small magnitude. Naturally, the weights with large magnitude contribute more to the network output. Therefore, to decrease the size of the network, we get rid of (prune) the small magnitude weights. The exact number of small magnitude weights to prune is set by the user — something like 10% is reasonable. The problem is, once this happens the network no longer is trained correctly. Therefore after pruning the small magnitude weights, we need to train the network again. We can do this cycle (prune->train) an arbitrary number of times depending on how small we want to make the network.

One advantage of pruning is that it actually works. There is substantial empirical evidence to support it. More specifically, pruning has been demonstrated to maintain accuracy while decreasing network size (memory). Are there any downsides? Yes — it may take a long time to complete the pruning process. Every iteration of the prune->train loop takes significant time, especially if the network is big. We can cut down on the number of loops by increasing the number of weights pruned during each iteration (e.g. from 10% to 20%), but then each iteration of the loop takes longer. To summarize, pruning is effective, but can take a while to complete.

There is also the issue that the final pruned subnetwork might have a strange…