DCGAN - Deep Convolution GAN

October 16, 2021

INTRODUCTION:

In recent research, we have seen multiple CNN classification, detection and segmentation algorithms such as Inception versions, YOLO versions, FCN, U-net and KSAC.

All high accuracy model requires a large amount of data along with labelling. In the real world, data are usually in the unlabeled form. So, data gathering and labelling are really time-consuming and erroneous.

DCGAN can be helpful to learn intermediate features from the unlabeled data. The feature representation can be leveraged for further all types of supervised tasks.

One might know that GAN contains two types of architecture:

Discriminator
Generator

After training of GAN:

Discriminators can be used for image classification tasks.
Generators can be used to manipulate the semantic qualities of generated images.

DETAIL OF DCGAN:

DCGAN is basically used to stable GANs for convolution in most settings. It also helps to visualize filters that have been learnt to detect specific shapes. Using generator input vector changes, One can manipulate generated object qualities.

Approach:

There are a total of 4 modifications done to make the model more stable and accurate.

use LAPGAN:

It helps to upscale low resolution generated images.

Replace max pooling with strides:

Allow the network to learn its own spatial downsampling.

eliminate fully connected layers with the global average pooling layer:

Helps to increase stability but reduce the convergence speed.

Use batch-normalization:

Gets the benefit of stabilized learning
Helps in poor weights initialization and gradient flow in deeper models.

Architecture:

Block Diagram:

Figure 1: DCGAN block diagram

Discriminator:

Figure 2: Discriminator architecture

Generator:

The generator uses Laplacian GAN architecture to generate high-quality samples of natural images.

Figure 3: Generator architecture

Training:

The model is trained using the following hyperparameters:

mini-batch stochastic gradient descent.
batch size = 128
weight initialization: zero centred normal distribution with 0.02 standard deviation.
slope of leaky relu = 0.2
Adam optimzer
The generator uses Laplacian GAN architecture to generate high-quality samples of natural images. The generator uses Laplacian GAN architecture to generate high-quality samples of natural images.learning rate = 0.0002
B1 = 0.5

Adversarial Loss:

In Every GAN model, Adversarial loss is a must. Adversarial loss tries to match the distribution of generated images to dataset distribution.

V (D, G) = E_{x \sim p_{d a t a} (x)} [lo g D (x)] + E_{z \sim p_{z} (z)} [lo g(1 - D (G (z)))]

Where D tries to maximize the correctness with the help of logD(x) and G tries to minimize the fake prediction with the help of log(1 -D(G(z))).

So, Adversarial loss is used in any GAN network. The loss function applies to the output of that discriminator.

Label Smoothing:

Label smoothing is a regularised technique that introduces noise of the labels which helps stabilize DCGAN training.

This accounts for the fact that datasets may have mistakes in them, so maximizing the likelihood of log P(y|x) directly can be harmful.

Assume for a small constant ϵ, the training set label y is correct with probability 1−ϵ and incorrect otherwise. Label Smoothing regularizes a model based on a binary classification by replacing the hard 0 and 1 classification targets with targets of ϵ−1 and 1−ϵ respectively.

Figure 4: label smoothing for 3 classes

CONCLUSION:

The Blog explained about basic understanding of DCGAN and architectures of discriminator and generator. The main focus is the stable network that can train any type of image generation with such specified classes. There are also architectural changes and training processing functionalities such as label smoothing and adversarial loss functions. It also mentioned basic training hyperparameters to start with.

Search This Blog

Generative Adversarial Network