Decision Forests, as the name suggests, are build from a collection of simple Decision Trees that are used as “weak” classifiers. Different methods are defined based on the way trees are combined.
Random Forests (RF’s), which are probably the most intuitive, leverage majority vote in classification or average in regression. In this case, each tree outputs e.g. the category index (0 or 1). The most common answer is taken as the final result. To avoid overfitting, each weak classifier is trained on a sub-sample (with replacement) of data, and make use of only a sub-set of the input features. This procedure is called bootstrap aggregating or bagging for short. Interestingly, Random Forests are also used to create sparse vector embeddings of input features. The basic idea is to vectorize the sequence of nodes and save them in a vector of binary elements. The size of the vector is either
n_estimators * max_nodes or
n_estimators * max_depth depending on the stopping criteria. Such vectors can be analyzed e.g. by very efficient linear classifiers (see for example this tutorial on the scikit-learn website).
On the other hand, Gradient Boosted Trees (GBDT’s) assign a different non-negative weight w (i.e. a relative importance) to each weak classifier h(x) and the final decision is the sum of w_i * h_i(x). In this approach, the objective function is minimized by improving the accuracy of classification at each step by adding trees that are better and better. To do so, the i-th tree is trained on the residuals of the (i-1)-th tree. The sequence of weights (w_1, w_2, …, w_N) turns out to be decreasing, i.e. each tree is a sort of higher-order correction in a Taylor series expansion.
Continue reading: https://towardsdatascience.com/a-quick-guide-to-decision-trees-bbd2f22f7f18?source=rss—-7f60cf5620c9—4