5.1. Core steps for building and evaluating models

In a nutshell, if I can summarize the core essence of using learning algorithms in scikit-learn it would consist of the following 5 steps:

from sklearn.modulename import EstimatorName      # 0. Import
model = EstimatorName() # 1. Instantiate
model.fit(X_train, y_train) # 2. Fit
model.predict(X_test) # 3. Predict
model.score(X_test, y_test) # 4. Score

Translating the above pseudo-code to the construction of an actual model (e.g. classification model) by using the random forest algorithm as an example would yield the following code block:

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(max_features=5, n_estimators=100)
rf.fit(X_train, y_train)
rf.score(X_test, y_test)

A cartoon illustration summarizing these core basic steps for using estimators (i.e. the learning algorithm function) in scikit-learn is shown below.

Cartoon illustration summarizing the creation, training and application of estimators for model building. Drawn by the Author.

Step 0. Importing the estimator function from a module of scikit-learn. An estimator is used to refer to the learning algorithm such as RandomForestClassifier that is used to estimate the output y values given the input X values.

Simply put, this can be best summarized by the equation y = f(X) where y can be estimated given known values of X.

Step 1. Instantiating the estimator or model. This is done by calling the estimator function and simply assigning it to a variable. Particularly, we can name this variable as model, clf or rf (i.e. abbreviation of the learning algorithm used, random forest).

The instantiated model can be thought of as an empty box with no trained knowledge from the data as no training has yet occured.

Step 2. The instantiated model will now be allowed to learn from a training dataset in a process known as model building or model training.

The training is initiated via the use of the fit() function where the training data is specified as the input argument of the fit() function as in rf.fit(X_train), which literally translates to allowing the instantiated rf estimator to learn from the X_train data. Upon completion of the calculation, the model is now trained on the training set.

Step 3. The trained model will now be applied to make predictions on a new and unseen data (e.g. X_test) via the use of the predict() function.


Continue reading: https://towardsdatascience.com/how-to-master-scikit-learn-for-data-science-c29214ec25b0?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com