Latent Space Exploration with StyleGAN2

 A fastai student put together a really great blog post that deep dives into exploring the latent space of the StyleGAN2 deep learning model.  We're going to be running through some of the different things he so elegantly described in detail on that blog post.  

And he also provides Jupyter notebooks for all of the associated code he used to build the examples shown on the blog post.  These can be run on either Colab or Gradient (with the TensorFlow 1.14 Container).

The tutorial code heavily relies on the Official StyleGan2 Repo, which is written with a depreciated version of Tensorflow.

There is a PyTorch official version available now, that fastai oriented folks might want to take a look at.


?? Need some additional background ??

Our recent Dive into Generative Adversarial Networks is a good place to get some more background on GANs and StyleGAN in particular.

If you are still confused about the concept of a latent space, here's a good blog post on Understanding Latent Space in Machine Learning.

Looking for a good explanation of the StyleGAN2 architecture, check out this post.

This blog post runs through the historical development of basic GANs to StyleGAN and then to StyleGAN2.

Here's a quick demo reel run through some StyleGAN2 results from the StyleGAN2 paper.


Looking for a more in depth explanation of the Progressive Growing of GAN technique that the StyleGAN paper pioneered to generate higher resolution output images from GAN models?  

Why of course you are.  Let's dive in.

Here's a talk by the lead author himself (Tero Karras) on his paper Progressive Growing of GANs for Improved Quality, Stability, and Variation.


 

Here's another talk by Andrew Martin from the Toronto Deep Learning Series in 2018 that reviews the paper Progressive Growing of GANs for Improved Quality, Stability, and Variation.  He also provides an overview on how GANs in general work.



Observations

1:  It's interesting to look at the various approaches used in the material presented here to prevent things like mode collapse and slow training, and then contrast it to some of the things Jeremy has been discussing in the fastai lectures associated with implementing GAN neural nets.  

Same for fastai techniques to boost the output resolution of GAN neural nets (and image generation nets in general).  Based off of U-Net. Combined with Jeremy's innovative approaches to really utilizing randomized data augmentation to push what the neural net learns to better serve the needs of the task you are trying to solve.


2:  I feel we really need to pull together a 2020 fastai GAN tutorial that gathers all of the latest fastai GAN thinking along with associated up-to-date fastai v2 code examples in one location.


3:  Do we need to come up with some new GAN focused abstractions to deal with all of the exciting new directions that GAN field is expanding into?


Lets look at one obvious candidate.  Dealing with the latent space representation inside of the GAN model.  Is there some abstraction with the elegance of the fastai DataLoader that we could develop to make manipulating the latent space easy to do?  

It seems like there are 2 different goals associated with this. One is to push the learning process when training the GAN to force the latent space to develop in some constrained way. With the goal of making the end result trained model latent space more user manipulatable.

The other goal involves making it as simple as possible for a developer to add user sliders to the GAN system.  If the latent space needs to be de-tangled like in StyleGAN there would just be some option in the abstraction you could turn on and it's built for you automatically.  If you need to access the neural net weight representation in a constrained way (again look to StyleGAN for an example of this) there is a  standardized way to do it.  StyleGAN also allows for constrained randomization to introduce subtle variations in the GAN Generator output.  The abstraction we are talking about would have this as an additional optional adjustment.

Again i keep coming back to how the DataLoader encapsulates everything a developer would possibly want or need to do associated with mucking around with the neural net model's data.  So we're looking for something that does the same encapsulation for user manipulation of the latent space associated with a GAN Generator.


I would also argue that there should be a similar abstraction for Feature Visualization in general.  Flip a switch and you should be able to graphically visualize any part of the feature representation contained within the deep learning model.


I think this particular area of GAN programming really needs to be looked at very carefully.  With the goal of figuring out how to make it super easy for a deep learning practitioner to manipulate the latent space representation(s) associated with  the GAN model they are working with.

It's possible there are some great blog posts i missed that address some or all of this already.  Feel free to point them out in the Comments section.  And i'm going to continue digging deeper into this as well.

Comments

Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics