Learning to generate images with Generative Adversarial Networks

3 minute read

Generative models try to model the distribution of the data in an explicit way, enabling us to easily sample new data points from this model. This is in contrast to discriminative models that try to infer the output from the input. A classic deep generative model is the Variational Autoencoder (VAE). Here, another generative model that has risen to prominence in recent years, the Generative Adversarial Network (GAN), will be discussed.

As the maths of Generative Adversarial Networks is somewhat tedious, a story is often told of a forger and a policeman to illustrate the idea.

Imagine a forger that makes fake bills, and a policeman that tries to find these forgeries. If the forger were a VAE, his goal would be to take some real bills, and try to replicate the real bills as precisely as possible. In GAN, he has a different idea in his mind: rather than trying to replicate the real bills, it suffices to make fake bills such that people think they are real.

Now lets start. In the beginning, the policeman knows nothing about how to distinguish between real and fake bills. The forger knows nothing either and only produces white paper.

In the first round, the policemanpoliceman gets the fake bill and learns that the forgeries are white while the real bills are green. The forger then finds out that white papers can no longer fool the policeman and starts to produce green papers.

In the second round, the policeman learns that real bills have denominations printed on them while the forgeries do not. The forger then finds out that plain papers can no longer fool the policeman and starts to print numbers on them.

In the third round, the policeman learns that real bills have watermarks on them while the forgeries do not. The forger then has to reproduce the watermarks on his fake bills.

Finally, the policeman is able to spot the tiniest difference between real and fake bills and the forger has to make perfect replicas of real bills to fool the policeman.

Now in a GAN, the forger becomes the generator and the policeman becomes the discriminator. The discriminator is a binary classifier with the two classes being “taken from the real data” (“real”) and “generated by the generator” (“fake”). Its objective is to minimize the classification loss. The generator’s objective is to generate samples so that the discriminator misclassify them as real.

Here we have some complications: the goal is not to find one perfect fake sample. Such a sample will not actually fool the discriminator: if the forger makes hundreds of the exact same fake bill, they will all have the same serial number and the policeman will soon find out that they are fake. Instead, we want the generator to be able to generate a variety of fake samples such that when presented as a distribution alongside the distribution of real samples, these two are indistinguishable by the discriminator.

So how do we generate different samples with a diterministic generator? We provide it with random numbers as input.

Typically, for the discriminator we use binary cross entropy loss with label 1 being real and 0 being fake. For the generator, the input is a random vector drawn from a standard normal distribution. Denote the generator by @@G_{\phi}(z)@@, discriminator by @@D_{\theta}(x)@@, the distribution of the real samples by @@p(x)@@ and the input distribution to the generator by @@q(z)@@. Recall that the binary cross entropy loss with classifier output @@y@@ and label @@\hat{y}@@ is

For the discriminator, the objective is

For the generator, the objective is

The generator’s objective corresponds to maximizing the classification loss of the discriminator on the generated samples. Alternatively, we can minimize the classification loss of the discriminator on the generated samples when labelled as real:

And this is what we will use in our implementation. The strength of the two networks should be balanced, so we train the two networks alternatingly, updating the parameters in both networks once in each interation.

Updated: