Understand best practices to optimize your model’s architecture and hyperparameters with KerasTuner and TensorFlow

Figure 0. Cover illustration | Image by author

Building machine learning models is an iterative process that involves optimizing the model’s performance and compute resources. The settings that you adjust during each iteration are called hyperparameters. They govern the training process and are held constant during training.

The process of searching for optimal hyperparameters is called hyperparameter tuning or hypertuning, and is essential in any machine learning project. Hypertuning helps boost performance and reduces model complexity by removing unnecessary parameters (e.g., number of units in a dense layer).

There are two types of hyperparameters:

  1. Model hyperparameters that influence model architecture (e.g., number and width of hidden layers in a DNN)
  2. Algorithm hyperparameters that influence the speed and quality of training (e.g., learning rate and activation function).

The number of hyperparameter combinations, even in a shallow DNN, can grow insanely large causing a manual search for an optimal set simply not feasible nor scalable.

This post will introduce you to Keras Tuner, a library made to automate the hyperparameter search. We’ll build and compare the results of three deep learning models trained on the Fashion MNIST dataset:

  • Baseline model with pre-selected hyperparameters
  • Optimized hyperparameters with Hyperband algorithm
  • Tuned ResNet architecture with Bayesian Optimization

You can view the jupyter notebook here.

Let us first import the required modules and print their versions in case you want to reproduce the notebook. We are using TensorFlow version 2.5.0 and KerasTuner version 1.0.1.

import tensorflow as tf
import kerastuner as kt
from tensorflow import kerasprint(f"TensorFlow Version: {tf.__version__}")
print(f"KerasTuner Version: {kt.__version__}")
>>> TensorFlow Version: 2.5.0
>>> KerasTuner Version: 1.0.1

We’ll begin by loading in the Fashion MNIST dataset. The goal is to train a machine learning model to classify different images of clothing.

Since the images are greyscale, which means each pixel value represents a number between 1 and 255, we can divide each pixel by 255 to normalize the values between 0 and 1. This will make training converge faster.

As mentioned, we will first train a shallow dense neural network (DNN) with preselected hyperparameters giving us a baseline performance. We’ll see later how simple models, like…

Continue reading: https://towardsdatascience.com/hyperparameter-tuning-with-kerastuner-and-tensorflow-c4a4d690b31a?source=rss—-7f60cf5620c9—4

Source: towardsdatascience.com