While browsing through the articles and tutorials for Data Science, both online and Medium, I noticed most are geared more towards intermediate developers and beyond. Even with some of the beginner-friendly tutorials, there’s a series of buzz words used that not everyone may know. So, the goal of this article today is to review the terminology to add some clarity to the wide world of Data Science.
This won’t be a fully comprehensive list. There are just too many terms that we could look at. But it will have some of the important ones, just not all of them due to the volume of information. But if there are any important vocabulary terms you think should be included, feel free to add them in the comments.
It’s only fitting to define the whole reason we’re here first. Data Science is the analysis of data, usually in large amounts. The goal is to provide meaning to that information to solve problems or make decisions. It encompasses the actual studying of the data, the data itself, the visualization, the prediction, the decision-making process, and so on.
The reason I include Big Data on the list is that “big data” sounds like any large amount of data. So how big is Big Data? Big Data is referring to any data too large to be stored on a single computer. This would mean that it’s also too big for something like SQL or Excel. It takes more effort to make meaningful information from the data because processing slowly could take weeks for single queries. That’s why typically it is too big for SQL.
But it’s not just the size of the data. It’s also how quickly data is generated. Big Data follows Moore’s Law, which states that the computing power doubles every two years.
With data so large, there must be some way of making meaningful information from it. That’s where Data Mining comes in. Data Mining refers to determining the relationships between variables and their outcomes are given a set of data. Typically, this is done by machines at a large scale. It also refers to the cleaning and organizing of that data. Ideally, it aids in decision-making with data that would have been too large to sort and describe. Such tasks include regression and classification, which we’ll get to later.
While finding meaning in data, Data Mining may search for frequent patterns, correlations, clusters, associations, or any predictive analysis that can be made.
A Data Warehouse is a repository…
Continue reading: https://towardsdatascience.com/commonly-used-words-in-data-science-ea06a8f17577?source=rss—-7f60cf5620c9—4