Adversarial Latent Autoencoders

 Now you may have noticed that the auto-encoder architecture is having a resurgence recently as an alternative to the GAN architecture for constructing a generative model.   Take a look at yesterday's post for one example. 

Today's post continues the trend by taking a look at the Adversarial Latent Autoencoder (basic architecture shown below).

The ALAE architecture is a modification of original GAN by decomposing the Generator(G)(\textbf{G}) and Discriminator(D)(\textbf{D}) networks into two networks such that: G\textbf{G} = G∘F\textit{G} \circ \textit{F} and D\textbf{D} = D∘E\textit{D} \circ \textit{E}. The architecture is shown in Figure 3. It's assumed that the latent space between both the decomposed networks is same and is denoted as W\mathcal{W}.

Below is a modification of ALAE to get StyleALAE

There are two components of StyleALAE:

  1. The generator of ALAE is replaced with the generator of StyleGAN as shown in the right side of figure 4.
  2. The left side of figure 4 is a symmetrical Encoder so that style information can be extracted which drives the StyleGAN generator.

The style information is extracted from the ithi^{th} layer by introducing Instance Normalization(IN) in that layer. This layer outputs channel-wise averages (μ)(\mu) and standard deviation (σ)(\sigma), which represents the style content in each layer. The IN layer provides normalization to the input in each layer. The style content of each such layer of the encoder is used as input by the Adaptive Instance Normalization (AdaIN) layer of the symmetric generator which is linearly related to the latent space ω\omega. Thus, the style content of the encoder is mapped to the latent space via a multilinear map.

Here's a short video someone put together that showcases this work, contrasting it to StyleGAN.

Here's a link to the paper.

Here's a link to the official code depository for the paper.

There are a number of different implementations of this algorithm available in code hereI'll pick a good PyTorch one to showcase at a later time (there are multiple ones at the link).

Here's a link to a blog article that discusses the architecture in detail (including the quotes i grabbed above for the description of the 2 figures).


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation