Photo by Ben White on Unsplash.

You have probably read an article about the difference between a data scientist and a data engineer. I always thought the distinction was clear. Data engineers make the data ready for use, and then data scientists work on that data.

However, my opinion on this distinction has changed dramatically after I started working as a data scientist.

Everything in data science starts with data. Your machine learning model is just as good as the data fed into it. Garbage in, garbage out! A data scientist cannot do some magic to create a valuable product without proper data.

The proper data is not always readily available for data scientists. In most cases, it will be the responsibility of the data scientist to convert the raw data to a proper format.

Unless you work for a big tech company that has separate teams of data engineers and data scientists, you should possess the ability and skills to handle some data engineering tasks. These tasks cover a broad range of operations, and I will elaborate on this in the remaining part of the article.

What is the difference anyway?

I would like to state my opinion on the relationship between the job of a data engineer and a data scientist.

A data engineer is a data engineer. A data scientist should be both a data scientist and a data engineer.

It may seem like an arguable statement. However, I would like to emphasize that my opinion was different before I started working as a data scientist. I used to think of data engineers and data scientists as separate entities.

In the remaining part of the article, I will try to explain what I mean by a data scientist should be both a data scientist and a data engineer.

For instance, data engineers do a set of operations known as ETL (extract, transform, load). It covers the procedures for collecting data from one or more sources, applying some transformations, and then loading it into a different source.

I would definitely not be surprised if a data scientist is expected to perform ETL operations. Data science is still evolving, and most companies do not have clearly separated data engineer and data scientist roles. As a result, a data scientist should be able to perform some data engineering tasks.

If you expect to only work on running machine learning algorithms with ready-to-use data, you will face the harsh truth soon after you start working as a data scientist.

You may have to write some stored procedures in SQL to preprocess…

Continue reading: https://www.kdnuggets.com/2021/09/data-scientists-data-engineering-skills.html

Source: www.kdnuggets.com