HTC Education Series: Getting Started with Deep Learning - Lesson 3

 Ok, let's dive into the third lecture in our 'Getting Started with Deep Learning' course series.  We're going to take a look at some simple cloud based deployment options, examine how to use fastai to do data augmentation, how to watch out for things going wrong in deep learning system, and then start to look deeper at the specifics of how neural nets actually learn.

Remember, a great way for you to help learn and retain the specific material these lectures cover is to put together your own summery of what was covered in the lecture.  You can use a text editor in another window on your computer to do this while listening to the lecture. Or you can watch the whole thing, then summarize afterwards, skipping the video around as needed to make sure you got everything. 

There is no wrong or right way to do this, figure out what works for you personally, then keep doing it.


You can also watch the video on the course site here  The advantage of that is that you can access on that site searchable transcript, interactive notebooks, setup guides, questionnaires, etc.


Don't forget to read the course book

We finish up Chapter 2, then skip to Chapter 4 for the Gradient Descent material covered in the second half of the lecture.  

We will return to Chapter 3 when we do the lecture on ethics and deep learning systems.


What is covered in this lecture

Working with fastai's DataBlock for data augmentation

  resize images in dataset so we can use as input to a specific deep learning model architecture

  make modified variants of our input images (add noise, warp, etc)

    offers different resize options

    fastai is using the gpu to do all of the data augmentation

    we can do random data augmentation on the fly as we are training 

        note how different and better this is as opposed to doing it before manually

    using the neural net to help find bad data (continuing theme discussed last week)

    get rid of bad data, then do real final training

How to build GUI in your Jupyter notebooks

How to build a deep learning Notebook app from a deep learning model

  this is great for demo tests, experiments

How to build standalone cloud based deep learning app

  Voila

        takes Jupyter notebook as input (code only)

  Binder

    GitHub repository that contains your Jupyter notebook

      outputs a blog web site using your notebook and voila

   this uses cpu rather then gpu for inference (not necessarily a bad thing)

Be aware that bias in your dataset can hugely impact how your deep learning system works

Writing down what you are learning (both in this course and in general) is a great way to solidify that knowledge in your head. 

    Also builds connections to other people in the world who will read what you have written.

    Building a blog based on Jupyter notebooks.

Breakdown of MNIST hand written digit recognition system

    simplified subset for tests, just 3 and 7 images

When checking out a new application of deep learning, make sure to try some tests of simple baseline solutions first.  Occam's Razor.

    If simple solutions work for the problem set, why waste time building more complicated ones. 

    If they don't work, then you have a comparison base for your new system's performance improvement.

Simple baselines to try in the MINIST example

    Try pixel difference between input image and an ideal digit (average of all digit samples (mean)).

Error Metrics (what the system is trying to optimize)

    L1 norm (mean absolute difference)

    L2 norm (root mean squared error (RMSE))

Broadcasting - useful Python feature - way to vectorize operations while avoiding loops in code

Gradient Descent (GD) - Stochastic Gradient Descent (SGD)

    Jeremy says SGD but this lecture is really about GD, SGD is covered in the next lesson

    -mechanism to build Arthur Samuel's ideal learning machine

        -really just Newton's method

    - if you can visualize pictorially, it's very simple to understand

        find the lowest valley in a landscape (get to the bottom of the hill as fast as you can)

Gradient of a function

-PyTorch calculates the derivative of any function automatically

    there is calculus going on, but the machine does it for you

    -derivative is just slope of the function (where is the ski slope steepest?)


Additional HTC Course Material

1:  We will continue our exploration of deep learning Feature Visualization with Part 2 of Xander's 'How Neural Nets Learn' lecture series (which we also present inline in this lesson below). This episode focuses on Adversarial Examples.  Adversarial examples are images specifically designed to fool trained neural networks into making wrong classification decisions.  


Check out how you can use Feature Visualization techniques to generate Adversarial Examples here.

What potential ethical issues should a designer of a deep learning system be aware of?  Think specifically about the ramifications of someone using adversarial examples to mislead your deep leaning system.


2: We're going to pull together some additional HTC specific material to help you get up to speed working with Jupyter notebooks on Gradient. 

If you haven't setup your Gradient account yet, here is some info on getting started. Start with the free tier.  It gives you access to cloud based cpu and gpu resources for free.

Gradient supports the Jupyter GUI elements used in the course.  Colab does not.

I'll update this section as the additional discussed material comes online.


3:  Now that you've heard Jeremy talk about gradient descent, go back to the MIT 'Intro to Deep Learning' bootcamp lecture we presented in the first HTC course lesson. Some of the middle and later parts of that alternative lecture should be much more familiar to you now you've seen it presented in slightly different ways by Jeremy and Xander. 

Jeremy is going to continue to dive into explaining this stuff from the fastai perspective in his next lecture. So he will cover the rest of neural net layer design, stochastic gradient descent, and back propagation in more detail. With a heavy emphasis on showing you how to code it.

Xander will continue to get us pumped up an excited about deep learning in the following weeks as well.  And make advanced research material surprisingly comprehensible in the process.


Observations
1: Many deep learning models take in square images and output square images.  This does not match how people typically use images, where they usually have some horizontal or vertical aspect ratio.

How are we going to deal with this issue in our deep learning systems?


2: Doing random augmentation transformations on your data on the fly while you are training is very powerful and extremely practical.  It tremendously helps with important issues like preventing over-fitting of training data.
 
Remember  - don't randomly augment the validation set (fastai insures this), just augment the training set.

Do you understand why this is important?


3: If you are like me, you might find the simple cloud based deployment options amusing, but what you are really interested in is deploying your custom deep learning solutions on your personal computer, your ipad, your smart phone, your cheap VPU hardware box, etc.  Don't worry, the HTC course will cover these real world deployment issues in later lessons.

In a perfect world, you should ideally be able to take a working Jupyter notebook associated with your custom deep learning neural net model, press a button, and out comes a working solution for all of the different 'run my model in production' hardware scenarios we just mentioned.  

Why is this not the case?

What could be done to make it work better?


4: Feedback loops can potentially cause a deep learning model to change the behavior and dynamics of a system it is supposed to just be observing and analyzing. This can produce all kinds of unforeseen consequences (in both your model system as well as in the real world at large).  

Rachel pointed out in this lecture that you are potentially at risk of a feedback loop any time your model is controlling what your next round of input data looks like. Many developers of these systems do not take this into account, assuming the data is static, at both their (and societies) potential peril.


Fun Stuff
1:  I use Kindle software to read the course e-book. And e-books in general.
Let's you read on different platforms, iMac, iPad, Kindle if you have one, etc.

Amazon also has a free 'Send to Kindle' app you can use to send pdf documents to your Kindle library.  This can come in handy for reading those pdf research papers we keep pointing you at throughout the course. Once they are in your Kindle library, you can read them in Kindle software on any of your various devices.

Having trouble sleeping. No worries, just dive into reading a stack of pdf research papers on your tablet device while lying in bed.

Paper books and Hawaii unfortunately do not mix together well. I ultimately had to toss a huge technical library i brought with me from San Francisco when i originally moved to Hawaii due to mold issues.  The extremely high humidity climate here is just not kind to paper (or leather, but that's another story).

So if you are building up a technical library (and you should), i highly recommend you build it using e-books instead of physical books made of paper.  The next time you move you'll also appreciate the extreme portability of e-book libraries.

2:  Jeremy pointed out that Lists and Dictionary comprehensions are useful features of Python.  Worth checking out.

3:  What is the difference between an array and a tensor?

4:  Building a blog based on Jupyter notebooks.

5:  Broadcasting
        - in PyTorch
        - in NumPy

6: Suppose an alien civilization wanted to hide themselves from the rest of the universe.  Could they use the adversarial example idea discussed in this lesson to cause remote astronomical observations to misread spectral signatures from their solar system that would hide signs of their civilization?  
How would they do this, and could you come up with ways to detect it? 
Is this the real signal metric we should be looking for when we search for life in the rest of the universe?


Need to review something from the previous lessons in the course. 
No problem.

You can access Lesson 1 here.

You can access Lesson 2 here.

You can go on to the next lesson 4 here.

Comments

Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics