Image source: Pixabay
Why profile the memory usage?
Suppose you have written a cool machine learning (ML) app or created a shiny neural network model. Now you want to deploy this model over some web service or REST API.
Or, you might have developed this model based on data streams coming from industrial sensors in a manufacturing plant and now you have to deploy the model on one of the industrial control PCs to serve decisions based on continuously incoming data.
“Excited to have developed a shiny ML model”. Image source: Pixabay
As a data scientist, an extremely common question that you may expect from the engineering/platform team is “how much memory footprint does your model/code have?” or “what’s the peak memory usage by your code when running with some given data load?”
This is natural to wonder about because hardware resources may be limited and one single ML module should not hog all the memory of the system. This is particularly true for edge computing scenarios i.e. where the ML app may be running on the edge e.g. inside a virtualized container on an industrial PC.
Also, your model may be one of the hundreds of models running on that piece of hardware and you must have some idea about the peak memory usage because if a multitude of models peaks in their memory usage at the same time, it can crash the system.
Now, that got you wondering, didn’t it?
Image source: Pixabay
… hardware resources may be limited and one single ML module should not hog all the memory of the system. This is particularly true for edge computing scenarios…
Don’t make this cardinal mistake
Note, we are talking about the runtime memory profile (a dynamic quantity) of your entire code. This has nothing to do with the size or compression of your ML model (which you may have saved as a special object on the disk e.g. Scikit-learn Joblib dump, a simple Python Pickle dump, a TensorFlow HFD5, or likes).
Scalene: A neat little memory/CPU/GPU profiler
Here is an article about some older memory profilers to use with Python.
In this article, we will discuss Scalene — your one-stop shop for answering these questions, posed by your engineering team.
As per its GitHub page, “Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while…
Continue reading: https://www.kdnuggets.com/2021/07/memory-machine-learning-code-consuming.html