Semantic Image Synthesis with Spatially Adaptive Normalization
A lot of work has been done developing deep learning nets that take an image as input and output a series of tags for objects present in the image. Less studied, but perhaps more interesting, is the notion of a deep learning net that takes a textural description of an image or scene and then outputs an image that looks like the textural description passed to it as input.
So let's deep dive into a recent paper that tries to turn a textural description of some imaginary image, and then generate an artificial image that looks like the description. And this particular approach adds an additional constraint, a segmentation map constraint to be precise. And it being precise is kind of the whole point to this.
The paper describing this 'Semantic image Synthesis with Spatially-Adaptive Normalization' can be found here.
A project page called 'Semantic Image Synthesis with SPADE' can be found here.
Here's a short video presentation of 'GauGAN: Semantic Image Synthesis with Spatially Adaptive Normalization'.
Divyansh Jha has put together a really great blog post on Implementing SPADE using Fastai. Perfect for HTC fastai api oriented deep learning folks. Check it out.
And of course the world moves on, and people are trying to improve SPADE performance. A recent attempt at improving SPADE is the Semantic Region-adaptive Normalization (SEAN) algorithm.SEAN is conditioned on segmentation masks that describe the semantic regions in the desired output image. Using SEAN normalization, a network architecture can be made to control the style of each semantic region individually.
The SEAN generator network is built on top of SPADE and contains three convolutional network layers with their biases and scales modulated separately by individual SEAN blocks. There are two inputs per SEAN block: the set of style codes for specific regions, and a semantic mask that defines regions to apply the style code.