In the world of information technology, companies use data to improve the customer experience and provide better services to their customers. Sometimes, the collection of data can be tedious and costly.
In this article, we will discuss GANs and specially Conditional GAN, a method we used for synthetic data generation at Y-Data. and how they can be used to generate synthetic datasets from them.
GAN was proposed by Ian Goodfellow et al.¹ in 2014 in this paper. The GAN architecture consists of two components called Generator and Discriminator. In simple words, the role of the generator is to generate new data (numbers, images, etc.) which is as close/similar to the dataset that is provided as input, and the role of the discriminator is to differ between generated data and real input data.
Let’s visit the algorithmic working of GAN in detail:
o The generator takes a vector of random numbers as input and returns the image generated by it.
o The image generated from the generator along with the sample of real images is passed as input to the discriminator.
o The discriminator takes samples of both types, images from real dataset and the samples of generated images. It returns a probability value between 0 and 1, where the value closer to 1 represents that more change of image belonging to the real dataset, otherwise there is more chance of image belonging from generated sample of images.
o The misclassification of the discriminator can be penalized when we calculate the discriminator loss. This discriminator loss then backpropagates and updates the discriminator weights, which in turn improves discriminator prediction.
o The generator then calculates the generator loss with the help of discriminator classification and backpropagates it through both discriminator and generator to calculate the gradient. It then updates only generator weights with those gradients.
Although GAN was able to generate some good examples of data points, it is not able to generate the data point with the target label and the dataset generated from it lacks diversity.
Conditional GAN was proposed by M. Mirza² in late 2014. He modified the architecture by adding the label y as a parameter to the input of the generator and try to generate the corresponding data point. It also adds labels to the discriminator input to distinguish real data better.
Continue reading: https://towardsdatascience.com/synthetic-data-generation-using-conditional-gan-45f91542ec6b?source=rss—-7f60cf5620c9—4