**Decision Forests**, as the name suggests, are build from a collection of simple Decision Trees that are used as “weak” classifiers. Different methods are defined based on the way trees are combined.

**Random Forests**** (RF’s)**, which are probably the most intuitive, leverage majority vote in classification or average in regression. In this case, each tree outputs *e.g.* the category index (0 or 1). The most common answer is taken as the final result. To avoid overfitting, each weak classifier is trained on a sub-sample (with replacement) of data, and make use of only a sub-set of the input features. This procedure is called *bootstrap aggregating** *or *bagging* for short. Interestingly, Random Forests are also used to create sparse vector embeddings of input features. The basic idea is to vectorize the sequence of nodes and save them in a vector of binary elements. The size of the vector is either `n_estimators * max_nodes`

or `n_estimators * max_depth`

depending on the stopping criteria. Such vectors can be analyzed *e.g.* by very efficient linear classifiers (see for example this tutorial on the scikit-learn website).

On the other hand, **Gradient Boosted Trees**** (GBDT’s)** assign a different non-negative weight *w* (*i.e.* a relative importance) to each weak classifier *h(x)* and the final decision is the sum of *w_i * h_i(x)*. In this approach, the objective function is minimized by improving the accuracy of classification at each step by adding trees that are better and better. To do so, the *i*-th tree is trained on the residuals of the *(i-1)*-th tree. The sequence of weights (*w_1*, *w_2*, …, *w_N*) turns out to be decreasing, *i.e.* each tree is a sort of higher-order correction in a Taylor series expansion.

Continue reading: https://towardsdatascience.com/a-quick-guide-to-decision-trees-bbd2f22f7f18?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com

## Comments by halbot