MLOps Blog

Generative Adversarial Networks and Some of GAN Applications: Everything You Need to Know

9 min
22nd August, 2023

The generative models method is a type of unsupervised learning.

In supervised learning, the deep learning model learns to map the input to the output. In each iteration, the loss is being calculated and the model is optimised using backpropagation.

In unsupervised learning, we don’t feed the target variables to the deep learning model like we would in supervised learning. Why?

Well, supervised learning algorithms are built to recognise an object in the case of image classification, or used to predict the next value in the case of regression. 

Unsupervised learning algorithms are used to learn the underlying pattern of the data, or the representation of the data. 

Unsupervised learning is used in tasks like: 

  • principal component analysis
  • clustering
  • anomaly detection

In essence, generative models, or deep generative models, are a class of deep learning models that learn the underlying data distribution from the sample. These models can be used to reduce data into its fundamental properties, or to generate new samples of data with new and varied properties.

Generative models have two types:

  1. Explicit likelihood models
  2. Implicit likelihood models

Explicit likelihood models: An explicit model learns the data distribution from the sample distribution, and generates a new type of data. These types of models have access to a probability distribution, and they’re trained using maximum likelihood. In maximum likelihood, an assumed model is trained to maximise the probability distribution of the data under the model.

Explicit likelihood models:

  • Maximum likelihood
    • PPCA, Factor Analysis, Mixture models
    • PixelCNN/PixelRNN
    • Wavenet
    • Autoregressive language models
  • Approximate maximum likelihood
    • Boltzmann machines
    • Variational autoencoders

Implicit likelihood models: Implicit models don’t learn the distribution of the data, but rather learn the statistical properties of the data, so it can generalise and generate new samples of data without depending on the probability distribution.

What are generative adversarial networks (GANs)?

Generative adversarial networks are implicit likelihood models that generate data samples from the statistical distribution of the data. They’re used to copy variations within the dataset. They use a combination of two networks: generator and discriminator.


A generator network takes a random normal distribution (z), and outputs a generated sample that’s close to the original distribution.


A discriminator tries to evaluate the output generated by the generator with the original sample, and outputs a value between 0 and 1. If the value is close to 0, then the generated sample is fake, and if the value is close to 1 then the generated sample is real.

In short, the discriminator’s job is to identify whether the generated sample is real or fake by comparing it with the original sample. The generator’s job is to fool the discriminator by generating samples that are close to the original sample.

How do generative adversarial networks work?

So, GANs have two networks. Both of these networks should be trained independently. The GAN framework is very straightforward when both models are multilayer perceptrons. Let’s see how GANs work.

A random normal distribution is fed into the generator. The generator then outputs a random distribution, since it doesn’t have a reference point. 

Meanwhile, an actual sample, or ground truth, is fed into the discriminator. The discriminator learns the distribution of the actual sample. When the generated sample from the generator is fed into the discriminator, it evaluates the distribution. If the distribution of the generated sample is close to the original sample, then the discriminator outputs a value close to ‘1’ = real. If both the distribution doesn’t match or they aren’t even close to each other, then the discriminator outputs a value close to ‘0’ = fake.

So how can the generator evolve to generate samples resembling the actual data?

In order to understand the evolution of the generator, we need to understand how the discriminator evaluates whether the generated sample is real or fake. 

The answer lies in the loss function or the value function; it measures the distance between the distribution of the data generated and the distribution of the real data. Both the generator and the discriminator have their own loss functions. The generator tries to minimize the loss function while the discriminator tries to maximize.

GANs equation

The generator is not directly connected to the loss but through the discriminator. The discriminator produces the output whether it’s fake or real. If the output is 0 = fake, then the generator loss penalizes the generator for producing a sample that the discriminator classified as fake. 

Once the loss has been calculated, generator weights are updated via backpropagation through the discriminator network to the generator. This is important, because the impact of a generator’s parameters depends very much on the discriminator’s, because of which the generator receives feedback and uses the same to produce images that are more ‘real’. 


For each training step, we start with the discriminator loop. We want to repeat this step before moving to the generator loop. 

Discriminator loop:

  1. Set a loop k where k>1. We do this because we want to make sure that the discriminator converges to a good estimator of original data pd. 
  2. Sample m noise data from a normal distribution {z1,z2,z3,…zn} and transform them through the generator.
  3. Sample m real data from a normal distribution {x1,x2,x3,…xn}.
  4. It’s important to remember that fake samples are labeled zero and real samples are labeled one.
  5. We then use the loss function to calculate loss using labels. 
  6. We take the gradient of the loss function with respect to the discriminator parameters, and update the weights in the discriminator. To update the gradient in the discriminator, we use gradient ascend because we want to maximize it. 
GANs equation

This completes the discriminator loop.

Generator loop:

The generator loop is similar. We start by:

  1. Sample m noise data from a normal distribution {z1,z2,z3,…zn} and transform them through the generator to get our fake samples.
  2. Since we’re only interested in updating the generator’s loop, we take in the gradient of the loss function with respect to the generator, eventually turning the derivatives to zero.
  3. Also, in the generator loop we won’t be working with the real sample, so the cost function reduces to:
GANs equation

Using the same equation, we can now update the weights of the generator using gradient descent. 

It is quite intriguing that the generator evolves by keeping the discriminator as a constant. The discriminator acts as a guide to help the generator learn and evolve!

Loss functions

Two loss function dominate in gans:

  1. Min-max loss
  2. Wasserstein loss 

Minimax loss

Minimax is all about maximizing error and minimizing the error. The min-max loss was first described in a 2014 paper by Ian Goodfellow et al., titled “Generative Adversarial Networks”. 

GANs equation

Minimax loss comes from game theory. It basically revolves around players competing against each other. The idea is that to win, a player must maximize their own probability of winning, and minimize it for the opponents by finding the best moves the opponent can make. 

Here, we need to understand that the discriminator is the player who wants to maximize the probability of winning by correctly classifying fake images the generator generates. It makes itself familiar with real images first, i.e. D(x) = 1, and then fake images, i.e. D(G(x)) = 0

The difference between (1 – D(G(x))) should increase. Larger difference indicates that the discriminator is performing well; it’s able to classify real and fake images. 

On the other hand, when it comes to the generator, it will try to minimize the winning probability of the discriminator by minimizing (1 – D(G(x))).

All the generator wants to do is to produce the sample that, when passed through the discriminator (D(G(x)), is closer to 1. Then, when loss is calculated (1 – D(G(x))), the output will be closer to zero. 

This process keeps on going until one player helps the other to evolve, or training iteration is terminated. 


Understanding GAN Loss Functions

Wasserstein loss 

Wasserstein loss function was developed for a new type of GAN called the WGAN, where the discriminator does not classify the output as fake or real, but for each generated sample it outputs a number between not between 0 and 1. Although the idea remains the same, where the real samples are represented by a larger number and fake samples are represented by a smaller number. 

So, it can’t really discriminate between real and fake samples. The WGAN discriminator is actually called a “critic”. 

Critic Loss: C(x) – C(G(z))

The discriminator tries to maximize this function, it’s the same as the minimax function we saw previously. It tries to maximize the difference between the real instances and the fake instances.

Generator Loss: C(G(z))

The generator tries to maximize this function. In other words, it tries to maximize the discriminator’s output for its fake instances.

Wasserstein Distance and Textual Similarity

Issues with GANs

Vanishing gradient descent

Vanishing gradient descent occurs when the derivative of the loss function with respect to the current weight in each iteration of training is so small, that the update to the original weights is almost negligible. 

In order to overcome this issue, WGANs are recommended. 

Mode collapse

Mode collapse happens when the generator is able to fool the discriminator with less variety of data samples. 

For instance, if input data has 10 different hand-digit numbers and the generator is able to fool the discriminator by generating only 4 types of hand digit numbers out of ten, then the GAN is suffering from mode collapse. 

Read also

GANs Failure Modes: How to Identify and Monitor Them

GAN variants

Deep Convolutional GAN 

DCGANs are an improvement of GANs that use convolutional neural nets. CNNs are good for extracting important features and representation from the data, making them more stable and enabling them to generate higher quality images. 

Conditional GANs

GANs can be made better by adding some extra information, like label y. 

From the above image you can see that both the generator and discriminator are conditioned with label y. That could be any kind of additional information, such as class labels or data.

In the generator, the prior input noise p(z), and label (y) are combined. In the discriminator, input (x) and label (y) are presented as inputs to a discriminative function.

cGANs learn to produce better images by exploiting additional information fed into the model.

Why cGAN?

By providing additional information GANs:

  1. Tend to converge faster; as it turns out that even random distribution will have some pattern.
  2. You can control the output of the generator at the time of inference by giving the label for the image that you want it to generate.


  1. Image-to-image translation
  2. Text-to-image synthesis
  3. Video generation

Image-to-image translation

Image-to-image translation is an application where a certain image B is transformed with the properties of A. 

Pix2Pix GAN

Earlier we saw how a random sample from a normal distribution is fed into the generator and unknown, or a new sample is generated. Pix2Pix GAN uses conditional GAN to translate one type of image to another type of image. 

Pix2Pix GAN uses a pair of images x and y. These pairs of images must be related. The input x is fed to the generator, which is a U-Net. The output of the generator is then fed to the discriminator, which is also fed with the label y. The intuition is that one image can be completely transformed into another image. 

Previously we saw that the generator learns to transform a random distribution to an image. In pix2pix, we see that one image is transformed or translated into a different type of image. 

Pix2Pix can be used for:

  1. Day-to-night or night-to-day translation
  2. Low resolution to high resolution
  3. Sketch-to-drawing


CycleGAN is an updated version of Pix2Pix GAN. CycleGAN uses unpaired image translation instead of paired image translation. This technique basically gives you a lot of opportunity to take any two pair images, and then transfer the properties of both the images to each other. 

As you can see from the image above, these techniques can be very useful. Artists can use it to translate a photo into a painting. 

CycleGAN typically uses two generators and discriminators. The idea behind it is that image A is fed into the generator, and it generates a certain image G(A). The same image G(A) is fed into another generator to reconstruct the original image F(A). The name cycleGAN is inspired by the fact that instead of calculating the loss in a conventional way, cycleGAN calculates the loss of the original image and the reconstructed image

Text-to-image synthesis


Stacked Generative Adversarial Networks (StackGAN) can generate images conditioned on text descriptions. 

The architecture comprises a stacked series of text and image GAN models. It’s again a class of conditional GANs. It has two GANs, also known as stage-I GAN and stage-II GAN. StackGAN uses a sketch-refinement process where the first level generator, Stage-I GAN, is conditioned on text and generates a low-resolution image, i.e. the primitive shape and colors of the descriptive text. 

The second level generator, Stage-II GAN, is conditioned both on the text and on the low-resolution image, which takes the stage-I results and adds compelling detail. 

“Low-resolution images are first generated by our Stage-I GAN. On the top of our Stage-I GAN, we stack Stage-II GAN to generate realistic high-resolution (e.g., 256×256) images conditioned on Stage-I results and text descriptions. By conditioning on the Stage-I result and the text again, Stage-II GAN learns to capture the text information that is omitted by Stage-I GAN and draws more details for the object…” Excerpt from StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (Han Zhang et al.)


Recently openai created a deep learning network called DALL-E, which also does text-to-image synthesis. 

Although the architecture doesn’t use GANs, but a version of GPT-3. 


6 GAN Architectures You Really Should Know

Face inpainting

Facial inpainting, also known as face completion, is the task of generating plausible facial features for missing pixels in a face image. 

This technique aims to produce a more appropriate and realistic face image from an image with a masked region, or one with missing content.

Implementation of GAN

Vanilla GAN

from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers.core import Dense, Dropout
from tensorflow.keras.layers.advanced_activations import LeakyReLU
from tensorflow.keras.datasets import mnist
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import initializers
from tensorflow.keras.layers import Activation, Dense
# load the data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# normalize our inputs to be in the range[-1, 1]
X_train = (X_train.astype(np.float32) - 127.5)/127.5
# convert x_train with a shape of (60000, 28, 28) to (60000, 784) so we have 784 columns per row
X_train = X_train.reshape(60000, 784)
for i in range(25):
    plt.imshow(X_train.reshape(X_train.shape[0], 28, 28)[i], interpolation='nearest', cmap='gray_r')


Vanilla GAN output
def get_optimizer():
    return Adam(lr=0.0002, beta_1=0.5)

def get_generator(optimizer, random_dim):
    generator = Sequential()
    generator.add(Dense(256, input_dim=random_dim, kernel_initializer = 'uniform', bias_initializer = 'zeros'))



    generator.add(Dense(784, activation='tanh'))
    generator.compile(loss='binary_crossentropy', optimizer=optimizer)
    return generator

def get_discriminator(optimizer):
    discriminator = Sequential()
    discriminator.add(Dense(1024, input_dim=784, kernel_initializer = 'uniform', bias_initializer = 'zeros'))



    discriminator.add(Dense(1, activation='sigmoid'))
    discriminator.compile(loss='binary_crossentropy', optimizer=optimizer)

    return discriminator

def get_gan_network(discriminator, random_dim, generator, optimizer):

    # We initially set trainable to False since we only want to train either the
    # generator or discriminator at a time
    discriminator.trainable = False

    # gan input (noise) will be 100-dimensional vectors
    gan_input = Input(shape=(random_dim,))

    # the output of the generator (an image)
    x = generator(gan_input)

    # get the output of the discriminator (probability of the image being real or not)
    gan_output = discriminator(x)

    gan = Model(inputs=gan_input, outputs=gan_output) # inputs and outputs under keras version 2.2.2
    gan.compile(loss='binary_crossentropy', optimizer=optimizer)

    return gan

# Create a wall of generated MNIST images
def plot_generated_images(epoch, generator, random_dim, examples=100, dim=(10, 10), figsize=(10, 10)):
    noise = np.random.normal(0, 1, size=[examples, random_dim])
    generated_images = generator.predict(noise)
    generated_images = generated_images.reshape(examples, 28, 28)

    for i in range(generated_images.shape[0]):
        plt.subplot(dim[0], dim[1], i+1)
        plt.imshow(generated_images[i], interpolation='nearest', cmap='gray_r')
    plt.savefig('gan_generated_image_epoch_%d.png' % epoch)

def train(X_train, y_train, x_test, y_test, epochs=100, minibatch_size=128, random_dim = 100):

# Build our GAN network
    adam = get_optimizer()
    G = get_generator(adam, random_dim)
    D = get_discriminator(adam)
    gan = get_gan_network(D, random_dim, G, adam)

    # for plotting at the end
    D_loss = []
    G_loss = []

    for e in range(1, epochs+1):
        print('-'*15, 'Epoch %d' % e, '-'*15)

        # Defines a cost related to an epoch
        epoch_cost = 0.

        # get number of minibatch based on size of data
        num_minibatches = int(X_train.shape[0] / minibatch_size)

        # Randomize data point
        X_train, y_train = shuffle(X_train, y_train)

        # Split the training data into batches of size 128
        for i in range(0, X_train.shape[0], minibatch_size):

            # Get pair of (X, y) of the current minibatch
            X_train_mini = X_train[i:i + minibatch_size]
            y_train_mini = y_train[i:i + minibatch_size]

            ##### Train discriminator #####
            # Get a set of legit images from MNIST data
            legit_images = X_train_mini[np.random.randint(0, X_train_mini.shape[0], size=int(minibatch_size/2))]

            # Get a set of fake images generated from noise
            noise = np.random.normal(0, 1, size=[int(minibatch_size/2), random_dim]) # random_dim = 100 here
            syntetic_images = G.predict(noise)

            # create 1 dataset with both legit (1) and generated (0) images
            x_combined_batch = np.concatenate((legit_images, syntetic_images))
            y_combined_batch = np.concatenate((np.ones((int(minibatch_size/2), 1)), np.zeros((int(minibatch_size/2), 1))))
            y_combined_batch[:int(minibatch_size/2)] = 0.9 # only for real images

            # Train discriminator
            D.trainable = True
            d_loss = D.train_on_batch(x_combined_batch, y_combined_batch)

            ###### Train generator #####
            noise = np.random.normal(0, 1, size=[minibatch_size, random_dim])
            y_gen = np.ones(minibatch_size)
            D.trainable = False
            g_loss = gan.train_on_batch(noise, y_gen)

        print ("Cost of D after epoch %i: %f" % (e, d_loss))
        print ("Cost of G after epoch %i: %f" % (e, g_loss))

        if e == 1 or e % 20 == 0:
             plot_generated_images(e, G, random_dim)

    # Save models in case (creates a HDF5 file 'model.h5')    

    return [D_loss, G_loss]
if __name__ == '__main__':
    [D_loss, G_loss] = train(X_train, y_train, X_test, y_test, epochs = 100, minibatch_size=128, random_dim = 100)

Applications of GANs

GANs have a lot of real life applications, some of which are:

  • Generate Examples for Image Datasets
    • Generating examples is very handy in medicine or material science, where there’s very little data to work with. 
  • Generate Photographs of Human Faces
    • Video game designers can use this to generate realistic human faces. 
  • Generate Realistic Photographs
    • Very useful for photographers and videographers. 
  • Generate Cartoon Characters
    • Artists can use this to create a new character design, or scenes in a cartoon, or even in a video game. 
  • Image-to-Image Translation
    • Photographers can use these algorithms to convert day into night, summer into winter, etc.
  • GANs can be used to simulate a worst-case scenario to optimize risk management in a business. 

Other use cases of GAN could be:

  • Text-to-Image Translation
  • Face Frontal View Generation
  • Generate New Human Poses
  • Photos to Emojis
  • Face Aging
  • Super Resolution
  • Photo Inpainting
  • Clothing Translation
  • Video Prediction
  • 3D Object Generation


In this article we’ve learned about:

  1. Generative modeling and generative models
    1. Explicit likelihood models
    2. Implicit likelihood models
  2. Generative Adversarial Networks
    1. It comes under the implicit likelihood model.
    2.  When we design GANs we do not care about the probability distribution of the real data but rather we try to model or generate the real data with the same distribution and variational features.
    3. It has two networks: generator and discriminator that tries to compete against each other simultaneously helping each other to learn better representations and distributions.
  3. Loss functions and their workings.
  4. Issues with GANs. 
  5. Different variants of GANs
  6. Lastly, we implemented vanilla GANs using keras. 

It’s a fascinating topic, and if you made it to the end – thank you for reading!

Was the article useful?

Thank you for your feedback!