Rewriting a GAN's Rules to Interactively Adjust it's Behavior
Let's continue our exploration of how GANs can be enhanced to allow for a user to interactively adjust their behavior.
David Bau is a Ph student in Antonio Torralba's Artificial Intelligence Laboratory at MIT. He is developing techniques to help understand how deep learning neural networks work. By focusing on the representations (and their latent structure) learned by deep learning neural networks.
Here's a quick overview of this research.
Here's a longer presentation on this work he just gave at AIM 2020 on August 28, 2020.
You can watch a video of the longer presentation here right now.
We'll try to get a you tube version of the video here soon.
David and other in the Torralba lab are doing some seriously excellent and fascinating work focused on better understanding how neural networks generate internal representations of the data sets they are modeling.
You can check out David's work on 'GAN Dissection: Visualizing and Understanding Generative Adversarial Networks' here.
You can check out David's work on 'Network Dissection: Quantifying Interpretability of Deep Visual Representation's here.
Andrew Ng at deeplearning.ai had a really nice overview description of this research that i'm including in this post below. It's from his most recent 'The Batch' emails that he sends out weekly. They are really great, feel free to check them out.
What One Neuron Knows
How does a convolutional neural network recognize a photo of a ski resort? New research shows that it bases its classification on specific neurons that recognize snow, mountains, trees, and houses. Zero those units, and the model will become blind to such high-altitude playgrounds. Shift their values strategically, and it will think it’s looking at a bedroom. What’s new:Network dissection is a technique that reveals units in convolutional neural networks (CNNs) and generative adversarial networks (GANs) that encode not only features of objects, but the objects themselves. David Bau led researchers at Massachusetts Institute of Technology, Universitat Oberta de Catalunya, Chinese University of Hong Kong, and Adobe Research. Key insight: Previous work discovered individual units that activated in the presence of specific objects and other image attributes, as well as image regions on which individual units focused. But these efforts didn’t determine whether particular image attributes caused such activations or spuriously correlated with them. The authors explored that question by analyzing relationships between unit activations and network inputs and outputs. How it works: The authors mapped training images to activation values and then measured how those values affected CNN classifications or GAN images. This required calculations to represent every input-and-hidden-unit pair and every hidden-unit-and-output pair.
The authors used an image segmentation network to label objects, materials, colors, and other attributes in training images. They chose datasets that show scenes containing various objects, which enabled them to investigate whether neurons trained to label a tableau encoded the objects within it.
Studying CNNs, the authors identified images that drove a given unit to its highest 1 percent of activation values, and then related those activations to specific attributes identified by the segmentation network.
To investigate GANs, they segmented images generated by the network and used the same technique to find relationships between activations and objects in those images.
Results: The authors trained a VGG-16 CNN on the places365 dataset of photos that depict a variety of scenes. When they removed the units most strongly associated with input classes and segmentation labels — sometimes one unit, sometimes several — the network’s classification accuracy fell an average of 53 percent. They trained a Progressive-GAN on the LSUN dataset’s subset of kitchen images. Removing units strongly associated with particular segmentation labels decreased their prevalence in the generated output. For example, removing a single unit associated with trees decreased the number of trees in generated images by 53.3 percent. They also came up with a practical, if nefarious, application: By processing an image imperceptibly, they were able to alter the responses of a few key neurons in the CNN, causing it to misclassify images in predictable ways. Why it matters: We often think of neural networks as learning distributed representations in which the totality of many neurons’ activations represent the presence or absence of an object. This work suggests that this isn’t always the case. It also shows that neural networks can learn to encode human-understandable concepts in a single neuron, and they can do it without supervision. Yes, but: These findings suggest that neural networks are more interpretable than we realized — but only up to a point. Not every unit analyzed by the authors encoded a familiar concept. If we can’t understand a unit that’s important to a particular output, we’ll need to find another way to understand that output. We’re thinking: In 2005, neuroscientists at CalTech and UCLA discovered a single neuron in a patient’s brain that appeared to respond only to the actress Halle Berry: photos, caricatures, even the spelling of her name. (In fact, this finding was an inspiration for Andrew’s earlywork in unsupervised learning, which found a neuron that encoded cats.) Now we’re dying to know: Do today’s gargantuan models, trained on a worldwide web’s worth of text, also have a Halle Berry neuron?
This talk by Shirley Ho of the Flatiron Institute from 2020 discusses the use of graph neural networks to try and model physical systems that involve n-body interactions. Moving on to the main presentation, Giuseppe Carleo of the Flatiron Institute presents a seminar on machine learning techniques for many-body quantum systems.
Like the Doublemint Twins touting the joys of Doublemint gum, 2 GANs are surely better than 1 GAN. Especially if we package them together inside of one meta GAN module. And this is exactly what the CycleGAN architecture does. Have you ever harbored dark secrets of turning a horse into a zebra? The CycleGAN was developed to do just that. Learn how to turn a horse into a zebra. And more. Now right away you can notice a difference between the image to image transformation GAN architectures we've been discussing over the last few posts. Those last few posts described systems that learn from a database of matched input-output image pairs. And if your goal is to turn an edge representation into a nicely filled in continuous tone image, it's easy to build your database of matched input-output image pairs that your GAN system can then learn off of. Take a continuous tone image (which will be the output of the database pair entry), then run it through an edge detector algorithm.
I thought following up yesterday's TraVelGAN post with a Pix2Pix GAN post would be useful to compare what is going on in the 2 architectures. Two different approaches to the same problem. I stole this Pix2Pix Overview slide below from an excellent deeplearning.ai GAN course (note that they borrowed it from the original paper) because it gives you a good feel for what is going on inside of the Pix2Pix architecture. Note how the Generator part is very much like an auto-encoder architecture, but rebuilt using the U-Net architecture features (based on skip-connections) that fastai has been discussing in their courses for several years before it became more widely known to the deep learning community at large (and which originally came from an obscure medical image segmentation paper) . So the Generator in this Pix2Pix GAN is really pretty sophisticated, consisting of a whole image to image auto-encoder network with U-Net skip connections to generate better image quality at highe
Post a Comment