Generative neural models

We're very interested in generative models here at HTC.  In the digital art world that would usually be thought of as some kind of procedural image generation algorithm. So generating visual imagery from some kind of mathematical algorithm.

So what's different about generative neural net models.  Well, they are trainable from data. And by training what we mean is that the net is learning the statistics of the data it was trained from.  Hopefully a lower dimensionality embedding of the information contained in the raw data it was trained on.

Raw data could be images, could be audio, could be text corpus, could be all kinds of things we haven't been clever enough to think about using yet.

So let's step back for a minute and think about supervised classification systems. So imagine a classification system that distinguished between photos of cats vs photos of dogs.  So you put together your training set of labeled images of cats and dogs. Keeping some of them in an escrow account to be used for testing the accuracy of your classifier later.

So you fire up Keras and quickly code up a deep learning neural net classification system.  You then do some more coding to shuffle your data set in and out of your Keras classifier. You then train your net. After it is trained, you try it out. Remember to see how well it does with the images in your escrow account as well as the ones you used for the training.

So now you deploy your trained net to something mere mortals could actually run in a desktop or mobile application.  Yours is this exciting new 'is it a cat' app. So you show it an image, and it says yes or no.

Now that's kind of anti-climatic in many ways.  All of that work, and the system gives you a one word answer. Maybe a probability as well, so then you're getting a real number out as opposed to just a binary flag. Which is definitely a step further. along a potentially more interesting path.

But imagine a different kind of network that generates a complete 2-dimensional array of real numbers. So you train your neural net on a set of data (images in this case), and when you are finished you have a system that can generate images.  Procedural or generative images.  Now we are getting into really interesting territory.

There's a really nice blog post on generative models at OpenAI that i am going to point you at.  So they run you through an explanation of what a generative model is.  They then show off an example based on the 1.2 million image collection in the ImageNet dataset.

They then describe the DCGAN network. This fascinating creature takes as input 100 random numbers, and outputs an image.  It's composed on standard convolutional neural network components.
So the idea is that you train the millions of parameters in this neural net to model the statistical properties of the data set you train it with.

Now every neural net training session is based on trying to minimize some kind of error function. And the detected error is then back-propagated back through the neural net so that it's internal parameters as slightly adjusted to lower the generated error.

So what kind of an error function are we interested in putting together for our generative system above?

One fun one might be to analyze the output image from the system and give an error of how well it represents the original image dataset. Because in this fun scenario we are interested in having our generative system output plausible images that seem like they could be a part of the training dataset.

Be careful, because how you design this error model is going to have profound implications as to how your final system is going to work. And by the same token, don't be blinded by someone's great solution to this issue to forget that there are probably a lot of other good ways to build one that might lead to very different results from the final trained system.

One clever approach to constructing a fun error model for this kind of system is the Generative Adversarial Network or GAN.  We get tricky by introducing a second discriminator network that tries to classify if an image input to it is real or not. By real we mean seems like it should be in our training set even if it isn't. So we're interested in perceptual similarity.

These two networks are setup to be locked in a battle: the discriminator is trying to distinguish real images from fake images and the generator is trying to create images that make the discriminator think they are real. 

So anyone with a biology background should now have neurons lighting up with tales of why the peacock's tail grows ever more colorful and overlarge in successive generations of evolution. An example of Fisherian Runaway, a sexual selection mechanism for exaggerating ornamentation over the course of evolution.

And i think anyone pursuing GAN work needs to go back and read Karl Sims original 94 Siggraph paper on evolving behavior in virtual creatures.  Which seems like it builds on his 91 Siggraph paper on Artificial Evolution in Computer Graphics.  Which is building on John Holland's work on genetic algorithms in the 70s.

So the blog post on generative models i pointed you at talks about using KL divergence loss from the data distribution as an error function for training GAN systems.  So a mathematical measure of how one probability distribution is different from a second. Sure, why not.

Although keep in mind that this is a loaded issue, because we are really interested in perceptual similarity and perceptual error, and while that's obviously related to the data distributions, matching distributions might be very different then matching perceptual salience.  Just like mean square error can be very different than human perceptual error. So it seems like ultimately you'd really want a better perceptual model in your discriminator.

So the blog post then runs through descriptions of 3 different approaches to generative models. The are Generative Adversarial Networks (GAN), which is what we just discussed. As well as Variational Autoencoders (VAE), and Autoregressive models like PixelRNN.

The autoregressive models are kind of fascinating, because they work off of conditional probability distributions based on previously visited pixels when processing the current pixel. With an inherent bias to a normal raster grid scan pattern for processing the image pixels.

So right off the bat you can see that you could construct it a totally different way using different scan patterns. Why not use geodesic scanning instead. So we will hear by officially christen the new Geodesic Autoregressive model and welcome it to the neural processing world.  May it be as interesting as geodesic recursive image filtering is. I think you heard about it here first (if not let me know with a link to the reference). More on all of that in a later post.

And if you have any appreciation of the history of texture synthesis algorithm development in computer graphics research, you would probably right away move to a pyramid approach for this kind of processing (the auto-regressive model is now working off of prior generated values across different spatial scales).

So, lot's of food for thought here i hope.  The OpenAI blog post continues their discussion with lot of additional great topics, so be sure to read it.

I hope you now have a good overview of what is going on in GAN systems. Maybe some slight appreciation for wondering why more perceptually salient error models aren't being considered. A little background in various historical research in different fields that seems highly related. And perhaps an appreciation for how there could easily be alternative formulations of these kinds of generative models that might lead to more interesting or at least visually different results.


Popular posts from this blog

Pix2Pix: a GAN architecture for image to image transformation

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Smart Fabrics