Image sourcePixabay (Free image)

Are you looking for “GPU-powered data science”?

Imagine yourself to be a data scientist, or a business analyst, or an academic researcher in Physics/Economics/Neuroscience…

You do a lot of data wrangling, cleaning, statistical tests, visualizations on a regular basis. You also tinker with a lot of linear models fitting data and occasionally venture into RandomForest. You are also into clustering large datasets. Sounds familiar enough?

However, given the nature of the datasets you work on (mostly tabular and structured), you don’t venture into deep learning that much. You would rather put all the hardware resources you have into the things that you actually do on a day-to-day basis, than spending on some fancy deep learning model. Again, familiar?

You hear about the awesome power and the blazing-fast computation prowess of GPU systems like the ones from NVidia for all kinds of industrial and scientific applications.

And, you keep on thinking — “What’s there for me? How can I take advantage of these powerful pieces of semiconductor in my specific workflow?”

You are searching for GPU-powered data science.

One of your best (and fastest) options to evaluate this approach is to use the combination of Saturn Cloud + RAPIDSLet me explain in detail…

GPUs in the AI/ML folklore have primarily been for deep learning

While the use of GPUs and distributed computing is widely discussed in the academic and business circles for core AI/ML tasks (e.g. running a 1000-layer deep neural network for image classification or billion-parameter BERT speech synthesis model), they have found less coverage when it comes to their utility for regular data science and data engineering tasks.

Nonetheless, data-related tasks are the essential precursor to any ML workload in an AI pipeline and they often constitute a majority percentage of the time and intellectual effort spent by a data scientist or even an ML engineer. Recently, the famous AI pioneer
Andrew Ng talked about moving from a model-centric to a data-centric approach to AI tools development. This means spending much more time with the raw data and preprocessing it before an actual AI workload executes on your pipeline.

So, the important question is: Can we leverage the power of GPU and distributed computing for regular data processing jobs?

Image source: Author created collage from free images (Pixabay)

While the use of…

Continue reading: