HTC Education Series: Getting Started with Deep Learning - Lesson 6

 This weeks lesson dives back into the nitty gritty details of working with deep learning systems using the fastai api.  We finish up Chapter 5 of the course book, look more deeply at Softmax, the specifics of how transfer learning works, and fun things to do with adaptive learning rates to improve system performance.

Then we look at multi-label classification problems (in contrast to the binary classification problems we have examined so far.  We show how to work with fastai's DataLoader and DataSet classes in the datablock api to use the datablock api with multi-label classification data.  We talk about modifying loss functions to work with multi-label data.

We then move on to take a look at deep learning based collaborative filtering applications (like how one might implement a Netflix recommendation system).  Where the deep learning system learns latent factors in the data.  The specific example discussed is recommendation systems, but the underlying principals are much more general, and can be applied to many different potential application scenarios.

You can also watch this lecture at the fastai course site here. The advantage of this is that you can look at course notes, a Questionnaire, a transcript of the lecture is available to train your next generative RNN system, etc.

What is covered in this lecture

Choosing a correct learning rate is important  (want to train as fast as possible without introducing error by overtraining)
    use Leslie Smith Learning Rate Finder (built into fastai api)
        - incredibly useful (practical), was only invented in 2015 (fastai first api to include it)

Transfer Learning
    start with pre-trained neural net
        freeze most layers (except for last ones which we throw away and replace with new ones (initialized to random weights))
        call learn.fine_tune() to train (trains just the randomly added new weights in the new output layer)
        unfreeze entire network, then train everything 
            - run learning rate finder again for unfrozen network to choose new correct learning rate.
            - use Discriminative learning rate (pass in slice) 
                different layers get different learning rates - use python's slice() to easily specify this

Half Precision floating point calculations
    cnn_learner(dls, resnet50, metrics=error_rate).to_fp16() 
        runs pre-trained resnet50 deep learning model using 16 bit float precision (normal float 32 bit)
         - uses less GPU memory
         - runs faster
         - can actually work better (stochastic variations introduced by rounding errors)

Multi-label Classification - image can have more than one label associated with it
    PASCAL dataset  (image might be labeled as containing 'people', 'car', 'bicycle')

Pandas (Python library to deal with standard data formats)
    reads in DataFrames

    image is inpout to neural net
    2 float numbers are output (position of center of head)

Sigmoid - nonlinear function to map a number to be between -1 and 1
    forces 2 float numbers to be valid position in image (-1 to 1, with 0 as center)

Collaborative Filtering
    Netflix movie, tv show recommendation system is the example used
    Key component of these systems is that they contain latent factors
        not directly defined in database, but observable in the data

    Based on dot-product

This collaborative filtering example will be finished up in the next lesson.

Additional HTC Course Material

1:  Last week Xander got us pumped up about an exciting new neural net architecture called a GAN (Generative Adversarial Network).  Specifically he detailed how the StyleGAN works.  GANs are a hot research topic and 3 new ones have probably been invented in the time it took me to type this sentence.

Today, Xander will discuss a second kind of neural net architecture that can also be used for image transformational applications (just like the GAN architecture can be used in this way).  This alternative class of neural networks is called a Variational AutoEncoder (VAE).  It is a neural net architecture that learns to compress data without supervision.

Both VAE and GAN architectures generate latent representations of the data they are modeling.

Notice how this concept of latent variables and latent factors keeps coming up over and over again.
These are key concept to get comfortable with.  And to try and develop more intuitive feels for working with.

2: Now that Xander has pumped us up about VAE architectures, and their usefulness for building latent space representations of images as well as their exciting potential to construct generative  transformational systems.  We're now ready to dive a little deeper into how to think about working with auto-encoders for representation learning.

This next lecture by MIT's Alexander Amini on 'Deep Generative Modeling' is just the ticket.  We're going to step back a little from details about code, and try to build better intuitive representations for the different components of these systems in our brains.
The material in this lecture should seem very familiar to anyone who watched Ava Soleimany give her version of it in this post.

3:  So i hope the differences between the HTC course vs just the fastai lectures is becoming more apparent.  We're going to hammer home concepts associated with latent space representation in deep learning systems.  Same for feature space visualization.  All with the goal of adding user sliders so that mere mortals can control deep learning systems just by adjusting some sliders.

Of course fastai part1 2020 ends after 8 lectures.  So 2 more to go on this HTC course.  But the HTC course will not be ending when we reach fastai lecture 8. 

The HYTC course has additional material we feel it is important you get exposed to.  Especially practical issues like best practices and understanding different options for deploying these systems into the real world.  

And it should be really clear at this point that we're going to dive deep into covering deep learning for performing learnable image transformation. So we'll be looking at GANs vs VAE architectures.

Associated with deployment we'd really like to lay out a strategy to get from fastai code prototypes to working deployed deep learning systems on OAK boards.


1:  Classification vs Regression labeling. The 'Regression' terminology i think is confusing for beginners.

I find the use of 'regression' as a term to describe a deep learning system that outputs continuous numbers to be unnecessarily confusing.  It's old terminology rooted in some historical 'regression analysis' data modeling history, and seems like distracting unnecessary baggage when you are talking about a neural net. 
Historically i think it helped prevent people from seeing important potential applications of neural nets. I certainly saw that to be the case in the 90s (people thought classification was all you could do with these systems).  I think it's still confusing as a term today, like in the fastai head tracking example discussed in this lecture.  Don't call it a regression net, just call it a net that outputs a 2 floating point numbers (2 numbers that correspond to a 2D position of a point in an image).

My Suggestion:
Just state clearly what kind of signals the input(s) into and output(s) out of the neural network are. 
    X binary label outputs, Y float number outputs, 1 BW image, 1 Color Image (RGB), 1 Color Depth Image (RGB and D), etc.

2:  Pay attention to the whole notion of how a latent space is built in the collaborative filtering example.  How this is developed is really slick, and very useful for so many different applications besides movie recommendations on Netflix

Don't forget to read the course book

Finished chapter 5.
Works through Chapter 6 on multi-label problems.

Need to review something from the previous lessons in the course.
No problem.

You can access Lesson 1 here.

You can access Lesson 2 here.

You can access Lesson 3 here.

You can access Lesson 4 here.

You can access Lesson 5 here.

You can move on to the next Lesson 7 in the course (when it posts on 11/09/20).


Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics