Try more than 20 machine learning models with only a few lines of code using LazyPredict

Image by Author

We have all been in this situation that we didn’t know which model is optimum for our ML project and most likely we were trying and evaluating many ML models just to see their behavior in our data. However, this is not a simple task and requires time and effort.

Fortunately, we can do this with only a few lines of code using LazyPredict. it will run more than 20 different ML models and return their performance statistics.

pip install lazypredict

Let’s see an example using the Titanic dataset from Kaggle.

import pandas as pd
import numpy as np
from lazypredict.Supervised import LazyClassifier, LazyRegressor
from sklearn.model_selection import train_test_split

data=pd.read_csv(‘train.csv’)

data.head()

Here, we will try to predict if a passenger survived the Titanic so we have a classification problem.

Lazypredict can also do basic data preprocessing like fill NA values, create dummy variables, etc. That means that we can test the models immediately after reading the data and without getting any errors. However, we can use our preprocessed data so the model testing will be more accurate as it will be closer to our final models.

For this example, we will not do any preprocessing and let the Lazypredict do all the work.

#we are selecting the following columns as features for our models
X=data[['Pclass', 'Sex', 'Age', 'SibSp',
'Parch', 'Fare', 'Embarked']]

y=data['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=7)# Fit LazyRegressorreg = LazyClassifier(ignore_warnings=True, random_state=7, verbose=False)#we have to pass the train and test dataset so it can evaluate the modelsmodels, predictions = reg.fit(X_train, X_test, y_train, y_test)
models
Image by Author

As you can see, it will return a data frame that contains the models and their statistics. We can see that Tree-Based models are performing better than the others. Knowing this, we can use Tree-based models in our approach.

You can get the complete pipeline and the models parameters used from Lazypredict as follows.

#we will get the pipeline of LGBMClassifier
reg.models['LGBMClassifier']

Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('numeric',
Pipeline(steps=[('imputer',
SimpleImputer()),
('scaler',
StandardScaler())]),
Index(['Pclass', 'Age', 'SibSp', 'Parch', 'Fare'],...

Continue reading: https://towardsdatascience.com/automated-machine-learning-model-testing-d0f49a36a6ac?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com