StyleGAN2 for Artistic Image Manipulation
One thing that blew me away in Jensen Huang's recent Nvidia keynote presentation in September was a demo of using GAN technology to manipulate artistic facial images. The video below gives a good demonstration of what you can do with the particular GAN system used in the demo.
Note that the user manipulation of the imagery is associated with the artist manipulating latent vectors inside of the GAN model That's a mouthful, but we'll dive into what it all means after taking a look at the demo.
So how does a system like this even work?
GAN stands for Generative Adversarial Network. It's a unique kind of deep learning neural net architecture. The way it works internally is that there are 2 different internal neural nets in competition with each other. They are a Generator and a Discriminator. The Generator tries to take random noise input, and turn it into an image. The Discriminator evaluates the output of the Generator and decides if it is a real image, or a fake image.
A database of real images are used to help train the Discriminator (and by association the complete GAN system itself). What is in this database is going to determine what the GAN system learns to generate.
So if you have a database of photos of flowers, or furniture, then the GAN system is learning how to generate fake images that look like the database it was trained on (flowers or furniture in this arbitrary example). For the example in the video above, artistically rendered facial images were used for the training database. So the system learns to generate fake images that look like artistically rendered facial images.
That in itself is pretty amazing. The GAN system is learning some kind of higher order representation of what the particular class of images it was trained on looks like. But the StyleGAN research goes even further. It delves into the notion of how an artist or user of a GAN system could start to control what is going on inside of it. And the way they do this is by manipulating latent vectors associated with the GAN system.
So what is a latent vector? Well the first set of latent vectors is the random noise used as the input to the GAN system. By manipulating that noise, you can change the output of the system. The issue is that human perceptually meaning features associated with the GAN output (smile vs frown on the face, sex of the face, type of hair, age of the face, etc) are entangled inside of the input latent vectors.
So the StyleGAN architecture (and StyleGAN2 in particular) utilize another internal neural network that tries to disentangle that latent space into more perceptually meaningful features. They also add some additional features to help generate slight random variations of the generated image.
The notion of adding user sliders to deep learning systems is crucial if we want to turn these systems into something digital artists can use in their daily work.
HTC has a wide variety of different posts on GAN technology. This field of research has been accelerating tremendously in the last 5 years. Because of that, it can be very difficult to sort out everything that is going on.
GANs are an example of a generative system. We also have a wide variety of posts on generative models. Keep in mind that GAN architectures are only one kind of generative model (there are others based on deep learning neural nets that work differently internally).
The StyleGAN2 paper is available here. Note that StyleGAn2 is a continuation of the original StyleGAN work.
There are several GitHub depositories associated with the StyleGAN research available here. The original work was done using tensorflow. There are GitHub depositories associated with PyTorch implementations of the underlying algorithms (heres one).
Observations
1: The notion of the random noise used as the input to the system being a latent vector is somewhat confusing at first (at least it was to me). I had to sit in colab and interpolate between 2 different random vectors, and use that interpolation as an input to the StyleGan2 system to really grok it. And low and behold, the images morph in between, morph in the sense that new random images that look really good are generated by the output.
So once you do that exercise, then that helps cement it in your brain better.
2: Note that if you train the GAN system to generate realistic facial images, it then outputs realistic fake images. But if you train it to generate artistically styled images, then the system learns to generate artistically styled fake images.
And when i say train, you do so just by pointing it at an appropriate database of images to learn from. So you can think of this as another clever way to generate 'style transfer'.
Because of course there is another way to think of these systems (the Berkley CycleGAN work). Where you build a double GAN that goes both directions. So then you train the system to transform faces into artistic rendered faces, and the other direction learns to transform artistic rendered faces into realistic faces.
So then you have a transformational GAN system, that generates 'fake' output in some style, but is transforming a specific input image into the desired 'artistic style' output.
3: There is a recent paper that tries to manipulate the parameters of the net directly rather than manipulate the latent space vectors to control it's output.
4: Here's another recent paper on 'GANSpace: Discovering Interpretable GAN Controls'.
The notion of how you add 'user sliders' to these systems is crucial for creating systems that actual artists can use in creative ways.
Comments
Post a Comment