Gradient Boosting algorithms tackle one of the biggest problems in Machine Learning: bias.
Decision Trees is a simple and flexible algorithm. So simple to the point it can underfit the data.
An underfit Decision Tree has low depth, meaning it splits the dataset only a few of times in an attempt to separate the data. Because it doesn’t separate the dataset into more and more distinct observations, it can’t capture the true patterns in it.
When it comes to tree-based algorithms Random Forests was revolutionary, because it used Bagging to reduce the overall variance of the model with an ensemble of random trees.
In Gradient Boosted algorithms the technique used to control bias is called Boosting.
But they were not alone. The field of Computer Science and Theoretical Machine Learning was riding this wave of enthusiasm and groundbreaking algorithms, and more scientists started developing Boosting approaches.
Friedman explored how Boosting can be the optimization method for an adequate loss function. While Schapire and Freund took inspiration on a question posed by Michael Kearns, Can a set of weak learners create a single strong learner?
In this context, a weak learner is any model that is slightly better than a random model, and will never efficiently achieve a training error of zero. A strong learner is the opposite, a model that can be improved and efficiently bring the training error down to zero.
Around the same time Jerome Freidman, was also experimenting with Boosting. He ended up developing several new algorithms, the most popular being Gradient Boosted Decision Trees.
The ingenious efforts of these and other scientists contributed to a vibrant new chapter in Machine Learning. It generated robust and powerful algorithms. Many of which are still relevant today.
The motto with Boosting is that the result is greater than the sum of its parts.
Boosting is based on the assumption that it’s much easier to find several simple rules to make a prediction than it is to find the one rule that is applicable to all data and generates the best possible prediction.
The intuition behind Boosting is that you train the same weak learner, a model with simple rules, several times. Then combine its weak predictions into a single, more accurate…
Continue reading: https://towardsdatascience.com/gradient-boosted-decision-trees-explained-with-a-real-life-example-and-some-python-code-77cee4ccf5e?source=rss—-7f60cf5620c9—4