For demonstration, we’ll be using the built-in breast cancer data from Scikit Learn to train a Support Vector Classifier (SVC). We can get the data with the load_breast_cancer function:

from sklearn.datasets import load_breast_cancercancer = load_breast_cancer()

Next, let’s create df_X and df_y for features and target label as follows:

# Features
df_X = pd.DataFrame(cancer['data'], columns=cancer['feature_names'])
# Target label
df_y = pd.DataFrame(cancer['target'], columns=['Cancer'])

P.S. If you want to know more about the dataset, you can run print(cancer['DESCR']) to print out summary and feature information.

After that, let’s split the dataset into a training set (70%) and a test set (30%) using training_test_split():

# Train test split
from sklearn.model_selection import train_test_split
import numpy as np
X_train, X_test, y_train, y_test = train_test_split(df_X, np.ravel(df_y), test_size=0.3)

We will be training a Support Vector Classifier (SVC) model. The regularization parameter C and kernel coefficient gamma are the two most important hyperparameters in SVC:

  • The regularization parameter C determines the strength of the regularization.
  • The kernel coefficient gamma controls the width of the kernel. SVC uses radial basis function(RBF) kernel by default (also known as the Gaussian kernel).

We will be tuning these 2 parameters in the following tutorial.

It’s tricky to find the optimal value for C and gamma. The simplest solution is to try a bunch of combinations and see what works best. This idea of creating a “grid” of parameters and just trying out all the possible combinations is called a Grid Search.

Grid Search — trying out all the possible combinations (Image by Author)

This method is common enough that Scikit-learn has this functionality built-in with GridSearchCV. The CV stands for Cross-Validation which is another technique to evaluate and improve our Machine Learning model.

GridSearchCV takes a dictionary that describes the parameters that should be tried and a model to train. The grid of parameters is defined as a dictionary, where the keys are the parameters and the values are the settings to be tested. Let’s first define our candidate C and gamma as follows:

param_grid = { 
'C': [0.1, 1, 10, 100, 1000],
'gamma': [1, 0.1, 0.01, 0.001, 0.0001]

Next, let’s create a GridSearchCV object and fit it to the training data.

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
grid = GridSearchCV(SVC(), param_grid, refit=True,...

Continue reading:—-7f60cf5620c9—4