As you might already know, classical autoencoders are widely used for representation learning via image reconstruction. However, there are many other types of autoencoders used for a variety of tasks. The subject of this article is Variational Autoencoders (VAE). As seen in the figure below, VAE tries to reconstruct an input image as well; however, unlike conventional autoencoders, the encoder now produces two vectors using which the decoder reconstructs the image. Thus, given the distribution, we can sample a random noise and produce realistic images.
The goal of VAE is to generate a realistic image given a random vector that is generated from a pre-defined distribution. This was not possible with the simple autoencoders I covered last time as we did not specify the distribution of data that generates an image. Thus, the strategy is as follows:
- The encoder takes an image and outputs two vectors where each one represents the mean and the standard deviation.
- We sum the mean vector and the standard deviation vector, which is first multiplied by a random small value as a noise, and get a modified vector, which is the same is size.
- The decoder takes the modified vector and tries to reconstruct the image.
- The loss value we try to optimize is a combination of L2 distance and the KL divergence, which measures the deviation of the distribution of the mean and the standard deviation vectors from 0 and 1 respectively.
Thus, we encourage our mean vector to have a distribution centred around 0, whereas the latter vector should be centred around 1 (gaussian distribution). Finally, our decoder will be able to generate realistic images out of random noise(vectors) generated with a mean of 0 and a standard deviation of 1.
We use KL divergence to calculate how different our feature vectors are from the desired distribution of values having a mean of 0 and a standard distribution of 1. The loss is calculated as follows:
where sigma and mu are for the standard deviation and the mean respectively. As seen, the goal is to make the mean (mu) to be as close to 0 as possible (by squaring the value). While the rest of the equation ensures the standard deviation (sigma) is close to 1. Note that we use the logarithm to make sure that the standard deviation is not negative.
The model I am going to use looks as follows:
As seen, our encoder outputs a log of…
Continue reading: https://towardsdatascience.com/variational-autoencoder-55b288f2e2e0?source=rss—-7f60cf5620c9—4