Prevent Your Model from Overfitting

Photo by on

Data augmentation is a technique to increase the diversity of dataset without an effort to collect any more real data but still help improve your model accuracy and prevent the model from overfitting. In this post, you will learn to implement the most popular and efficient data augmentation procedures for object detection task using Python and OpenCV.

The set of data augmentation methods that are about to be introduced includes:

  1. Random Crop
  2. Cutout
  3. ColorJitter
  4. Adding Noise
  5. Filtering

Firstly, let’s import several libraries and prepare some necessary subroutines before going ahead.

The below image is used as a sample image during this post.

Image: tr03–14–18–1-FRONT.jpg in

Random Crop selects randomly a region and crops it out to make a new data sample, the cropped region should have the same width/height ratio as the original image to maintain the shapes of objects.

Image by Author

From the above figure, the left image indicates the original image with the ground-truth bounding boxes (in red), a new sample as the right image is created by cropping the region inside the orange box. In the new sample’s annotation, all the objects which do not overlap with the orange box in the left image are removed, and the coordinates of the objects which lie on the orange box boundary are refined to be proper with the new image sample. The outputs of random crop for an original image are a new cropped image and its annotation.

Cutout, introduced in 2017 by Terrance DeVries and Graham W. Taylor in their , is a simple regularization technique of randomly masking out square regions of input during training, which can be used to improve the robustness and overall performance of convolutional neural networks. This method is not only extremely easy to implement but also demonstrates that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance.

As in the paper, cutout was applied to improve image recognition (classification) accuracy , hence, if we deploy the same scheme to object detection dataset, it may cause the problem of losing objects, especially small objects. In the figure below, a considerable number of small objects inside the cutout area (black region) are removed, and this is not proper with the spirit of data augmentation.

Image by Author

In order to make this manner suitable to object detection, we can make a…

Continue reading:—-7f60cf5620c9—4