How GPUs Accelerate Data Science & Data Analytics

Artificial intelligence (AI) is set to transform global productivity, working patterns, and lifestyles and create enormous wealth. Research firm Gartner expects the global AI economy to increase from about $1.2 trillion last year to about $3.9 Trillion by 2022, while McKinsey sees it delivering global economic activity of around $13 trillion by 2030. In many ways, at its core, this transformation is fueled by powerful Machine Learning (ML) tools and techniques.

It is now well established that the modern AI/ML systems’ success has been critically dependent on their ability to process massive amounts of raw data in a parallel fashion using task-optimized hardware. Therefore, use of specialized hardware like Graphics Processing Units (GPUs) played a significant role in this early success. Since then, a lot of emphasis has been given on building highly optimized software tools and customized mathematical processing engines (both hardware and software) to leverage the power and architecture of GPUs and parallel computing.

While the use of GPUs and distributed computing is widely discussed in the academic and business circles for core AI/ML tasks (e.g. running a 100-layer deep neural network for image classification or billion-parameter BERT speech synthesis model), they find less coverage when it comes to their utility for regular data science and data engineering tasks. These data-related tasks are the essential precursor to any ML workload in an AI pipeline and they often constitute a majority percentage of the time and intellectual effort spent by a data scientist or even a ML engineer.

In fact, recently, the famous AI pioneer Andrew Ng talked about moving from a model-centric to a data-centric approach for AI tools development. This means spending much more time with the raw data and preprocessing it before an actual AI workload executes on your pipeline.

You can watch Andrew’s interview here:

Andrew Ng

This brings us to an important question…

Can we leverage the power of GPU and distributed computing for regular data processing jobs too?

The answer is not trivial, and needs some special consideration and knowledge sharing. In this article, we will try to show some of the tools and platforms that can be used for this purpose.

Image source

RAPIDS: Leverage GPU for Data Science

The RAPIDS suite of open source software libraries and…

Continue reading: