5.1. Core steps for building and evaluating models
In a nutshell, if I can summarize the core essence of using learning algorithms in
scikit-learn it would consist of the following 5 steps:
from sklearn.modulename import EstimatorName # 0. Import
model = EstimatorName() # 1. Instantiate
model.fit(X_train, y_train) # 2. Fit
model.predict(X_test) # 3. Predict
model.score(X_test, y_test) # 4. Score
Translating the above pseudo-code to the construction of an actual model (e.g. classification model) by using the random forest algorithm as an example would yield the following code block:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(max_features=5, n_estimators=100)
A cartoon illustration summarizing these core basic steps for using estimators (i.e. the learning algorithm function) in
scikit-learn is shown below.
Step 0. Importing the estimator function from a module of
scikit-learn. An estimator is used to refer to the learning algorithm such as
RandomForestClassifier that is used to estimate the output
y values given the input
Simply put, this can be best summarized by the equation
y = f(X) where
y can be estimated given known values of
Step 1. Instantiating the estimator or model. This is done by calling the estimator function and simply assigning it to a variable. Particularly, we can name this variable as
rf (i.e. abbreviation of the learning algorithm used, random forest).
The instantiated model can be thought of as an empty box with no trained knowledge from the data as no training has yet occured.
Step 2. The instantiated model will now be allowed to learn from a training dataset in a process known as model building or model training.
The training is initiated via the use of the
fit() function where the training data is specified as the input argument of the
fit() function as in
rf.fit(X_train), which literally translates to allowing the instantiated
rf estimator to learn from the
X_train data. Upon completion of the calculation, the model is now trained on the training set.
Step 3. The trained model will now be applied to make predictions on a new and unseen data (e.g.
X_test) via the use of the
Continue reading: https://towardsdatascience.com/how-to-master-scikit-learn-for-data-science-c29214ec25b0?source=rss—-7f60cf5620c9—4