Computer Vision (CV) is one of the most exciting subfields within the Artificial Intelligence (AI) and Machine Learning (ML) domain. It is a major component for many modern AI/ML pipelines, and it’s transforming almost every industry, enabling organizations to revolutionize the way machines and business systems work.

Academically, CV has been a well-established area of computer science for many decades, and over the years, a lot of research has gone into this field to make it better. However, the use of deep neural networks has recently revolutionized the field and given it new fuel for accelerated growth.

There is a diverse array of application areas for computer vision, such as:

  • Autonomous driving
  • Medical imaging analysis and diagnostics
  • Scene detection and understanding
  • Automatic image caption generation
  • Photo/face tagging on social media
  • Home security
  • Defect identification in manufacturing industries and quality control

In this article, we discuss some of the most popular and effective datasets used in the domain of Deep Learning (DL) to train state-of-the-art ML systems for CV tasks.

Choose the Right Open-source Datasets Carefully

Training machines on image and video files is a serious data-intensive operation. A singular image file is a multi-dimensional, multi-megabytes digital entity containing only a tiny fraction of ‘insight’ in the context of the whole ‘intelligent image analysis’ task.

In contrast, a similar-sized retail sales data table can lend much more insight into the ML algorithm with the same expenditure on computational hardware. This fact is worth remembering while talking about the scale of data and computing required for modern CV pipelines.

Consequently, in almost all cases, hundreds (or even thousands) of images are not enough to train a high-quality ML model for CV tasks. Almost all modern CV systems use complex DL model architectures, and they will remain under-fitted if not supplied with a sufficient number of carefully selected training examples, i.e., labeled images. Therefore, it is becoming a highly common trend that robust, generalizable, production-quality DL systems often require millions of carefully chosen images to train on.

Also, for video analytics, the task of choosing and compiling a training dataset can be more complicated given the dynamic nature of the video files or frames obtained from a multitude of video streams.

Here, we list some of the most popular ones (consisting of both…

Continue reading: