A brief overview of what Big Data is and the first layer of a data-gathering solution

Photo by Etienne Girardet on Unsplash

Do you use navigation software to get from one place to another? Did you buy a book on Amazon? Did you watch “Stranger Things” on Netflix? Did you look for a funny video on YouTube?

If you answered yes to any of these questions, congratulations! You are a big data producer. In fact, even if you did not answer “yes” to any of my questions, you’re probably still contributing to big data — in today’s world, where each of us has at least one smartphone, laptop, or smartwatch, smart car system, robotic vacuum cleaner, and more, we produce a lot of data in daily activities that seem trivial to us.

When we say Big Data, we usually mean particularly large data, fast and in different structures, so that it is difficult or impossible to analyze it with traditional tools.

It is common to define the concept of Big Data with the “three Vs”:

  1. Volume — the size of the data
  2. Velocity — the speed of data gathering
  3. Variety — the different types of data

In order to solve the complex problems that the world of big data raises, the solution is divided into five layers:

  1. Data Sources
  2. Data Ingestion
  3. Data Storage
  4. Data Processing — preparation and training
  5. Serve
big data layers architecture / Image by author

Data Ingestion is the first layer in the Big Data Architecture — this is the layer that is responsible for collecting data from various data sources—IoT devices, data lakes, databases, and SaaS applications—into a target data warehouse. This is a critical point in the process — because at this stage the size and complexity of the data can be understood, which will affect the architecture or every decision we make down the road.

Image by Author
  1. Availability — The data is available to all users: BI analysts, developers, sales and anyone else in the company can access the data.
  2. Uniformity — a quality data ingestion process can turn different types of data into a unified data that is easy to read and perform statistics and manipulations on.
  3. Save money and time — a data ingestion process saves engineers time in trying to collect the data they need and develop efficiently instead.
  1. Complexity — Writing data ingestion processes can be complex due to data velocity and variety, and some times so development times can be costly in time and resources.
  2. Data security…

Continue reading: https://towardsdatascience.com/what-is-data-ingestion-5220edf50677?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com