HTC Education Series: Getting Started with Deep Learning - Lesson 6
This weeks lesson dives back into the nitty gritty details of working with deep learning systems using the fastai api. We finish up Chapter 5 of the course book, look more deeply at Softmax, the specifics of how transfer learning works, and fun things to do with adaptive learning rates to improve system performance.
Then we look at multi-label classification problems (in contrast to the binary classification problems we have examined so far. We show how to work with fastai's DataLoader and DataSet classes in the datablock api to use the datablock api with multi-label classification data. We talk about modifying loss functions to work with multi-label data.
We then move on to take a look at deep learning based collaborative filtering applications (like how one might implement a Netflix recommendation system). Where the deep learning system learns latent factors in the data. The specific example discussed is recommendation systems, but the underlying principals are much more general, and can be applied to many different potential application scenarios.
You can also watch this lecture at the fastai course site here. The advantage of this is that you can look at course notes, a Questionnaire, a transcript of the lecture is available to train your next generative RNN system, etc.
What is covered in this lecture
Choosing a correct learning rate is important (want to train as fast as possible without introducing error by overtraining)
use Leslie Smith Learning Rate Finder (built into fastai api)
- incredibly useful (practical), was only invented in 2015 (fastai first api to include it)
start with pre-trained neural net
freeze most layers (except for last ones which we throw away and replace with new ones (initialized to random weights))
call learn.fine_tune() to train (trains just the randomly added new weights in the new output layer)
unfreeze entire network, then train everything
- run learning rate finder again for unfrozen network to choose new correct learning rate.
- use Discriminative learning rate (pass in slice)
different layers get different learning rates - use python's slice() to easily specify this
runs pre-trained resnet50 deep learning model using 16 bit float precision (normal float 32 bit)
- uses less GPU memory
- runs faster
- can actually work better (stochastic variations introduced by rounding errors)
Multi-label Classification - image can have more than one label associated with it
PASCAL dataset (image might be labeled as containing 'people', 'car', 'bicycle')
Pandas (Python library to deal with standard data formats)
reads in DataFrames
image is inpout to neural net
2 float numbers are output (position of center of head)
Sigmoid - nonlinear function to map a number to be between -1 and 1
forces 2 float numbers to be valid position in image (-1 to 1, with 0 as center)
Netflix movie, tv show recommendation system is the example used
Key component of these systems is that they contain latent factors
not directly defined in database, but observable in the data
Based on dot-product
This collaborative filtering example will be finished up in the next lesson.
Additional HTC Course Material
1: Last week Xander got us pumped up about an exciting new neural net architecture called a GAN (Generative Adversarial Network). Specifically he detailed how the StyleGAN works. GANs are a hot research topic and 3 new ones have probably been invented in the time it took me to type this sentence.
Today, Xander will discuss a second kind of neural net architecture that can also be used for image transformational applications (just like the GAN architecture can be used in this way). This alternative class of neural networks is called a Variational AutoEncoder (VAE). It is a neural net architecture that learns to compress data without supervision.
Both VAE and GAN architectures generate latent representations of the data they are modeling.
Notice how this concept of latent variables and latent factors keeps coming up over and over again.
These are key concept to get comfortable with. And to try and develop more intuitive feels for working with.
2: Now that Xander has pumped us up about VAE architectures, and their usefulness for building latent space representations of images as well as their exciting potential to construct generative transformational systems. We're now ready to dive a little deeper into how to think about working with auto-encoders for representation learning.
This next lecture by MIT's Alexander Amini on 'Deep Generative Modeling' is just the ticket. We're going to step back a little from details about code, and try to build better intuitive representations for the different components of these systems in our brains.
The material in this lecture should seem very familiar to anyone who watched Ava Soleimany give her version of it in this post.
3: So i hope the differences between the HTC course vs just the fastai lectures is becoming more apparent. We're going to hammer home concepts associated with latent space representation in deep learning systems. Same for feature space visualization. All with the goal of adding user sliders so that mere mortals can control deep learning systems just by adjusting some sliders.
Of course fastai part1 2020 ends after 8 lectures. So 2 more to go on this HTC course. But the HTC course will not be ending when we reach fastai lecture 8.
The HYTC course has additional material we feel it is important you get exposed to. Especially practical issues like best practices and understanding different options for deploying these systems into the real world.
And it should be really clear at this point that we're going to dive deep into covering deep learning for performing learnable image transformation. So we'll be looking at GANs vs VAE architectures.
Associated with deployment we'd really like to lay out a strategy to get from fastai code prototypes to working deployed deep learning systems on OAK boards.
1: Classification vs Regression labeling. The 'Regression' terminology i think is confusing for beginners.
I find the use of 'regression' as a term to describe a deep learning system that outputs continuous numbers to be unnecessarily confusing. It's old terminology rooted in some historical 'regression analysis' data modeling history, and seems like distracting unnecessary baggage when you are talking about a neural net.
Historically i think it helped prevent people from seeing important potential applications of neural nets. I certainly saw that to be the case in the 90s (people thought classification was all you could do with these systems). I think it's still confusing as a term today, like in the fastai head tracking example discussed in this lecture. Don't call it a regression net, just call it a net that outputs a 2 floating point numbers (2 numbers that correspond to a 2D position of a point in an image).
Just state clearly what kind of signals the input(s) into and output(s) out of the neural network are.
X binary label outputs, Y float number outputs, 1 BW image, 1 Color Image (RGB), 1 Color Depth Image (RGB and D), etc.
2: Pay attention to the whole notion of how a latent space is built in the collaborative filtering example. How this is developed is really slick, and very useful for so many different applications besides movie recommendations on Netflix
Don't forget to read the course book
Finished chapter 5.
Works through Chapter 6 on multi-label problems.
Need to review something from the previous lessons in the course. No problem.
This talk by Shirley Ho of the Flatiron Institute from 2020 discusses the use of graph neural networks to try and model physical systems that involve n-body interactions. Moving on to the main presentation, Giuseppe Carleo of the Flatiron Institute presents a seminar on machine learning techniques for many-body quantum systems.
Like the Doublemint Twins touting the joys of Doublemint gum, 2 GANs are surely better than 1 GAN. Especially if we package them together inside of one meta GAN module. And this is exactly what the CycleGAN architecture does. Have you ever harbored dark secrets of turning a horse into a zebra? The CycleGAN was developed to do just that. Learn how to turn a horse into a zebra. And more. Now right away you can notice a difference between the image to image transformation GAN architectures we've been discussing over the last few posts. Those last few posts described systems that learn from a database of matched input-output image pairs. And if your goal is to turn an edge representation into a nicely filled in continuous tone image, it's easy to build your database of matched input-output image pairs that your GAN system can then learn off of. Take a continuous tone image (which will be the output of the database pair entry), then run it through an edge detector algorithm.
I thought following up yesterday's TraVelGAN post with a Pix2Pix GAN post would be useful to compare what is going on in the 2 architectures. Two different approaches to the same problem. I stole this Pix2Pix Overview slide below from an excellent deeplearning.ai GAN course (note that they borrowed it from the original paper) because it gives you a good feel for what is going on inside of the Pix2Pix architecture. Note how the Generator part is very much like an auto-encoder architecture, but rebuilt using the U-Net architecture features (based on skip-connections) that fastai has been discussing in their courses for several years before it became more widely known to the deep learning community at large (and which originally came from an obscure medical image segmentation paper) . So the Generator in this Pix2Pix GAN is really pretty sophisticated, consisting of a whole image to image auto-encoder network with U-Net skip connections to generate better image quality at highe
Post a Comment