Posts

Showing posts from October, 2020

Pix2Pix: a GAN architecture for image to image transformation

Image
 I thought following up yesterday's TraVelGAN post with a Pix2Pix GAN post would be useful to compare what is going on in the 2 architectures.  Two different approaches to the same problem. I stole this Pix2Pix Overview slide below from an excellent deeplearning.ai GAN course (note that they borrowed it from the original paper) because it gives you a good feel for what is going on inside of the Pix2Pix architecture.    Note how the Generator part is very much like an auto-encoder architecture, but rebuilt using the U-Net architecture features (based on skip-connections) that fastai has been discussing in their courses for several years before it became more widely known to the deep learning community at large  (and which originally came from an obscure medical image segmentation paper) . So the Generator in this Pix2Pix GAN is really pretty sophisticated, consisting of a whole image to image auto-encoder network with U-Net skip connections to generate better image quality at highe

TraVelGAN - a new approach to the problem of unpaired image to image transformation.

Image
 We have discussed the CycleGAN architecture in a recent post.  CycleGAN is a GAN architecture for learning unpaired image to image translations. Unlike other approaches such as CycleGAN to this difficult problem, a new architecture called TraVelGAN does not rely on pixel to pixel difference between images.  So it doesn't use any cycle consistency constraints (2 GANs helping each other out internal to the architecture) .  Instead, it uses a new module called a Siamese Network. Marco Pasini has a good blog post called 'A New Way to Look at GANs', that provides a good overview explanation of what a Siamese network is all about. The introduction of a Siamese network into an overall architecture allows for organizing the latent space associated with the algorithm it is working within.  It allows an image to be encoded to a latent vector. The TraVelGAN architecture is a traditional Generator-Discriminator GAN architecture with the addition of an internal separate Siamese Networ

DDSP - Differentiable Digital Signal Processing

Image
 DSP (digital signal processing) is the technology behind so many different wonderful things in our modern world. Digital guitar effects, digital amp modeling, music synthesizers, audio plug in effects for DAW.  We haven't even left the world of musical applications, and already the list of applications gets longer and longer.  Digital compression algorithms alone (for audio and speech, for images, for videos) have transformed society.  Feel like streaming video is destroying civilized life, blame DSP for making it possible.  Or is it all Claude Shannon's fault . DDSP extends the power of conventional DSP by making it learn by example.  So rather than hand designing a particular architecture and associated algorithm to perform some task, we can just put together a database of examples, and then let the system learn the correct solution to the problem we have data examples of that we want to solve. Now this should sound very familiar to what is going on in neural networks.  And

HTC Seminar Series #18- The Future of Computing and Programming Languages

Image
 Today's HTC Seminar Series is a great podcast discussion with Chris Lattner on The Future of Computing and Programming Languages hosted by Lex Fridman. Everyone with any interest in computing should find this discussion very interesting.  But especially so if you have been watching those last 2 fastai Part 2 lectures from the 2019 Part 2 fastai course.  This is because Chris Lattner co-lectures those 2 course presentations with Jeremy, and they talk about building fastai in Swift.  And indeed they do just that in front of your very eyes in those two lectures. So then if you watched the recent podcast with Jeremy of fastai right after the Part 1 2020 fastai course lectures came out we posted her in this blog post , Jeremy seemed way less excited about Switft for fastai in Aug 2020 then he did in those last two 2019 Part 2 course lectures. I guess a big part of the answer is that Chris was working at Google on bringing Swift to Tensorflow when those two 2019 course lectures were tap

HTC Education Series: Getting Started with Deep Learning - Lesson 5

Image
 This week's lesson is a little bit different from the normal formula.  The fastai lecture part of the HTC lesson will be taught by Rachel Thomas rather then Jeremy.  She will be discussing the ethical implications of deep learning systems (and AI systems in general).   As a designer of such systems, you need to make yourself aware of potential consequences of the system, what impact(s) it might have on the world, for good, or for evil. What do you need to be thinking about as a designer of such systems to avoid potentially huge issues developing after they are deployed into the world at large? Are there design strategies you can follow that help identify potential problems and squash them before they become huge problems? You can also watch this lecture at the fastai course site here .  The advantage of this is that you can look at course notes, a Questionnaire, a transcript of the lecture is available to train your next generative RNN system, etc. What is covered in this lecture

Building AI Music Composition Products

Image
 Today's talk is the Yin to the Yang associated with yesterday's AI Death Metal talk by CJ Carr of the dadabots group ( or maybe i have that yin-yang labeling backwards).   They are seemingly two very different and contrary viewpoints on how to approach deep learning and apply it to automatic music composition.  But they are interconnected and perhaps even complimentary in our ever more interconnected world.  And it's wonderful they were both presented at the same technical conference on AI and Musical Creativity. Ed Newton-Rex is a product director at TikTok Europe.  He is working there now because TicTok acquired ByteDance, which acquired Jukedeck (an award winning AI music composition company he founded) . This talk has a very heavy 'business' zeitgeist to it.  We could tag it with terms like serious, market analysis, profit-loss analysis, practical, etc. So it sits at on end of latent space associated with automated music composition and the databots presentati

AI Death Metal - Eliminating Humans from Music

Image
 We've been trying to informally have an interesting music or art oriented post on the weekends (the convergence of art and technology).  And today's presentation (also from this week's AI Music Creativity 2020 Conference) does not disappoint. This presentation is by CJ Carr of the dadabots group.  Dadabots participated in the 2020 AI Song Competition in Germany, and their song was the judges favorite, and came in second place in the overall competition results.  The German news media reported this as 'runner up from Germany in song competition calls for the annihilation of humanity'. This presentation is great. It's entertaining.  It's thought provoking.  It touches on the future of artistic manipulation, the nature of performance, copyright laws.  It also utilizes state of the art deep learning algorithms to model and generate audio. Observations 1:  I thought the 'eliminating humans from music' tagline was obviously a funny joke, but apparently pe

2020 Joint Conference on AI Music Creativity

Image
 Due to the magic of a global pandemic, we can all experience fascinating academic conferences virtually in the comfort of our homes.  We all hope the pandemic ends sooner rather than later, but i think it would be great if the 'hold your conference virtually' trend continues indefinitely.   From the standpoint of trying to deal with climate change, it makes total sense.   From the standpoint of making better use of people's time, it makes total sense.   From the standpoint of reaching a bigger audience, it makes total sense.   For the purposes of archiving knowledge for easy access whenever anyone needs to understand something, it makes total sense. One example of what i'm talking about is this really  great conference on AI and Music Creativity, happening in Stockholm Sweden this week.  We can all check out this event virtually, and watch any of the presentations that peak our interest.  And there are a lot of great presentations to choose from. I'm going to pre

Andrew Ng Interviews Chris Manning on Developments in Natural Language Processing.

Image
 Todays HTC Seminar Series presentation is an interview conducted by Andrew Ng with Chris Manning, who is a professor of computer science and Linguistics at Stanford University.  It's part of Andrew's Heroes of NLP video interview series . Chris talks about the history of natural language processing (NLP).  Then he dives into new developments in the modern era associated with deep learning systems.   He has some thoughts on the current bigger and bigger model trend (like GPT-3), and whether this approach might scale to more general artificial general intelligence (AGI).   He also has some advice for people looking to build careers in AI and NLP.

HTC Seminar Series #17- From Deep Learning of Distangled Representation to Higher Level Cognition

Image
 Today's HTC seminar is a great presentation by Yoshua Bengio at Microsoft Research in 2018.  The first half has a lot of relevance to our recent 'manipulate the embedded representation of a GAN' discussions in some recent blog posts .  The second half talks about how to move deep learning up the cognitive food chain. In 2018, Yoshua Bengio ranked as the computer scientist with the most new citations worldwide, thanks to his many high-impact contributions.  In 2019, he received the ACM A.M. Turing Award, “the Nobel Prize of Computing”, jointly with Geoffrey Hinton and Yann LeCun for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing. The talk starts by reviewing earlier work on the notion of learning disentangled representations and deep generative models and propose research directions towards learning of high-level abstractions. This follows the ambitious objective of disentangling the underlying causal factors

Latent Space Exploration with StyleGAN2

Image
 A fastai student put together a really great blog post that deep dives into exploring the latent space of the StyleGAN2 deep learning model.  We're going to be running through some of the different things he so elegantly described in detail on that blog post.   And he also provides Jupyter notebooks for all of the associated code he used to build the examples shown on the blog post.  These can be run on either Colab or Gradient (with the TensorFlow 1.14 Container). The tutorial code heavily relies on the Official StyleGan2 Repo , which is written with a depreciated version of Tensorflow. There is a PyTorch official version available now, that fastai oriented folks might want to take a look at. ?? Need some additional background ?? Our recent Dive into Generative Adversarial Networks is a good place to get some more background on GANs and StyleGAN in particular. If you are still confused about the concept of a latent space, here's a good blog post on Understanding Latent Spa

HTC Education Series: Getting Started with Deep Learning - Lesson 4

Image
  Ok, let's dive into the 4th lecture in our Getting Started with Deep Learning series.   We're going to dive under the hood in this lecture to get a better understanding of how deep learning neural nets really work.  We'll do this by building and training one from scratch.  We'll start old school with a single layer linear network.  Then extend it using ReLU nonlinearities to create a deep nonlinear network.  Along the way we will discover our old friend the sigmoid function, as well as the newer softmax activation function. Remember, a great way for you to help learn and retain the specific material these lectures cover is to put together your own summery of what was covered in the lecture.  You should ideally do this before reading our summery below. You can also watch the video on the fastai course site here . The advantage of that is that you can access on that site searchable transcript, interactive notebooks, setup guides, questionnaires, etc. Don't forget to

Novel View Synthesis Tutorial

Image
 Most things about living through a pandemic suck, but there does appear to be one side benefit, every academic and technical conference you might be interested in is being conducted virtually remotely.  So you can sit in your home and watch the lectures and tutorials without having to spend thousands of dollars and jacking the planet's carbon footprint through the roof in the process. So here we have a cutting edge view into a fascinating sub domain of computer vision called View Synthesis.  So working off of a single or small number of photos of a scene and then letting a user manipulate the view of that scene in 3D.   These Novel View Synthesis Tutorial lectures are from CVPR 2020 this summer. Keep in mind that my purpose of pointing you at this tutorial is to encourage you to watch the 30 minute intro lecture at the beginning by Orazio Gallo called 'Noel View Synthesis: A Gentle Introduction'. After you do, you will have a good introduction into the history and current

Training GANs on Smaller Data Sets

Image
 Training GANs on small datasets can oftentimes lead to overfitting of the training dataset (mode collapse).  A new paper from Tero Karras and other Nvidia colleagues talks about approaches to solve this problem. The new approach, called Adaptive Discriminator Augmentation, deals with designing a strategic data augmentation regime to help the GAN training deal with the smaller dataset used for training. Now if you are a fastai person, this kind of sounds like what Jeremy has been teaching in the fastai course.  How to be clever with generating randomized augmented variations of your input data on the fly during training.  And indeed the fastai api is setup to make that not only easy to do, but allows you to run that randomized data augmentation on the GPU. There is a githib site devoted to this work, called the StyleGAN2 with Adaptive Discriminator Augmentation (ADA).  It also contains an official tensorflow implementation that supersedes the previously published StyleGAN2 work. Here&

GAN Deep Dive - Part 2

Image
Let's continue our deep dive exploration of Generative Adversarial Networks (GAN).  This post builds off of the material in Part 1 , so check that out first if you haven't. We're going to start out by working through Lesson 7 from the 2019 fasai course. This is a super information packed lecture, filled with great stuff. Jeremy starts off by showing us how to build the Resnet architecture from scratch.  He does this to show off a very important technique called the 'skip connection'. He then covers the fascinating U-net architecture, which also uses skip connections to build super resolution in the U-net's output. He then covers 2 new loss functions, feature loss, and gram loss. Building on all of the above, he then moves into Generative Adversarial Networks (GAN). You may recall in Part 1 Jeremy mentioned that transfer learning might be useful for training GANs, but he wasn't familiar with anyone using it yet.  In this lecture a year later he shows you how

Rewriting a GAN's Rules to Interactively Adjust it's Behavior

Image
 Let's continue our exploration of how GANs can be enhanced to allow for a user to interactively adjust their behavior. David Bau is a Ph student in Antonio Torralba's Artificial Intelligence Laboratory at MIT.  He is developing techniques to help understand how deep learning neural networks work.  By focusing on the representations (and their latent structure) learned by deep learning neural networks. Here's a quick overview of this research. Here's a longer presentation on this work he just gave at AIM 2020 on August 28, 2020. You can watch a video of the longer presentation  here right now. We'll try to get a you tube version of the video here soon. David and other in the Torralba lab are doing some seriously excellent and fascinating work focused on better understanding how neural networks generate internal representations of the data sets they are modeling. You can check out David's work on 'GAN Dissection: Visualizing and Understanding Generative Ad

HTC Seminar Series #16 - Style and Structure Disentanglement for Image Manipulation

Image
 This weeks HTC Seminar Series talk focuses on how to make deep learning systems that transform images more user controllable.  So a neural net that takes an image as it's input, processes it using a trained deep learning model, and then outputs the deep learning model's processing result as a new output image. We would like to add user adjustable slider controls to the deep learning model.   We would also like these user slider controls to correspond to some aspect of human perception of the comparison between the input image and the generated output image. So that adjusting the slider controls results in useful and understandable manipulation of some aspect of the properties of the imaging transformation being generated by the deep learning model . Richard Zhang is a research scientist at Adobe Research..  His presentation is entitled 'Style and Structure Disentanglement for Image Manipulation'.  It was presented at the AIM Workshop, ECCV 2020. Note that there are a l

GAN Deep Dive - Part 1

Image
There have been a lot of recent HTC posts on the deep learning GAN architecture. Let's take a deep dive into how to code up different GAN architectures using the fastai api.   We're going to have several different posts that run through a number of different approaches to building GAN models.  Keep in mind that the GAN sub field of deep learning is changing rapidly as more and more research is done.  So we're going to be working through some of the historical developments in this series of posts. Part 1 starts with Lecture 12 of the 2018 fastai course.  The lecture starts off with taking a look at the DarkNet architecture used in YOLOv3.  This is very interesting stuff, so feel free to check it out.  But it doesn't really have anything to do with GANs directly.   The lecture then focuses on Generative Adversarial Networks (GAN) from 48.38 onwards.  This later part of the lecture is our focus for this particular GAN Deep Dive post. It starts by taking a look at the DCGA