CycleGAN: a GAN architecture for learning unpaired image to image transformations

 Like the Doublemint Twins touting the joys of Doublemint gum, 2 GANs are surely better than 1 GAN.  Especially if we package them together inside of one meta GAN module.  And this is exactly what the CycleGAN architecture does.

Have you ever harbored dark secrets of turning a horse into a zebra?  The CycleGAN was developed to do just that. Learn how to turn a horse into a zebra.  And more.

Now right away you can notice a difference between the image to image transformation GAN architectures we've been discussing over the last few posts.  Those last few posts described systems that learn from a database of matched input-output image pairs.

And if your goal is to turn an edge representation into a nicely filled in continuous tone image, it's easy to build your database of matched input-output image pairs that your GAN system can then learn off of.  Take a continuous tone image (which will be the output of the database pair entry), then run it through an edge detector algorithm.  That generated edge image is then the input of the database pair entry.

Repeat as necessary (just a for loop in the code) until you have enough pair entries in your data model.

And many different transformations one would like to learn can be modeled in this kind of system.  But then there's that 'turn the horse into a zebra' fantasy we all seem to harbor.  How are we going to do that?  

Are we going to climb over fences in the middle of the night to paint horses or cows with zebra stripes.  Snap a quick photo before we start our dastardly deed.  Then paint the horse or cow quickly before their owner notices and comes out of his house with a shotgun. Snap that output photo of the zebra stripe painted horse, then run for the fence before the pack of pit bulls the owner has let lose on you make it to their target.

And you are going to have to keep doing that over and over again until you get enough example entries in your database to train your neural net.

A much better approach is just to build a big stack of images of horses, and then a big stack of images of zebras.  No cross correlation between any entries in either stack of images.  There are no input-output examples in this database. Just examples of each of the 2 categories (horses and zebras).

The insight that Cycle GAN introduces goes as follows.  You build a generator much like the Pix2Pix architecture, that the GAN is going to train to be a Generator to transform a horse into a zebra.  And then you build a Generator (again based on the Pix2Pix architecture) for a second inverse GAN that is supposed to take a photo of a zebra, and turn it into a picture of a horse.  

You use the same Patch based Discriminator used in Pix2Pix as well, as the Discriminator for both of the directional GANs.  The classification matrix output (as opposed to a single number as output) from the Discriminator is associated with the individual positions of the Patch as it is positioned across the image.

And then they added an additional optimization factor (to take advantage of this second inverse GAN sitting inside of CycleGAN. They want the input horse to the first GAN Generator to match the output horse of that second inverse GAN Generator. The 2 images should be identical. Which means that one could use local pixel comparison to generate an error metric for this additional system optimization term. The additional optimization term that tries to ensure the input of Generator 1 matches the output of Generator 2.

This first slide shows the first half of CycleGAN that tries to generate  a fake house from a real zebra.  The second slide shows the second half of the CycleGAN that tries to generate a fake zebra from a real horse.  Both halves include cycle consistency loss that tries that make the inverted generator's output match the input to the non inverted generator.

The CycleGAN paper called 'Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks' can be found here.

A PyTorch implementation of CycleGAN can be found on GitHub here.  Fastai folks will rejoice, but is there an official 2020 fastai api v2 CycleGAN implementation code example out there?

A PyTorch of CycleGAN and Pix2Pix can be found on GitHub here.

Contrastive Unpaired Translation (CUT) is a newer hot off the presses unpaired image to image transformation architecture by the CycleGAN team.  You can check out a PyTorch implementation of CUT (and it's good buddy FastCUT) on GitHub here.  

Their recent paper titled 'Contrastive Learning for Unpaired Image-to-Image Translation can be found here.

We also had a recent post in our HTC Seminar Series, titled HTC Seminar Series #16 -Style and Structure Disentanglement for Image Manipulation, that includes a lecture that covers CycleGAN and CUT and Pix2Pix.

Hey, what about TraVelGAN.  

How does it relate to all of this Doublemint trouble lucking in the CycleGAN architecture?

What was it's key feature that directly relates to the dual GAN approach used to build CycleGAN?

Should we be using TraVelGAN like implementations of these CycleGAN tasks?

If we build a SuperGAN  that has both the CycleGAN and the TraVelGAN implementations working together in one SuperGAN module, can that lead to some interesting new properties?  Or at least more controllability over the end results of the transformation process?

The CycleGAN architecture is complicated enough that if we can simplify it using something like TraVelGAN does, that seems like a win.

Note that bot CUT and TraVelGAN seem to be stressing the importance of working with the latent structure rather then pixel differences directly.


Popular posts from this blog

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics