Intro to Deep Learning
This presentation is the first introductory lecture in a week long intensive deep learning bootcamp taught at MIT. This particular lecture is the first one from the January 2020 Deep Learning Bootcamp. It has a really nice wizzy demo at the beginning (example of deep fakes) to get you excited about the potential of deep learning (and to also get you thinking about the ethical implications of this material).
I think it's good when trying to learn something new to come at the material in different ways. By being exposed to different explanations of the same material by different people with slightly different perspectives, you can get a better understanding of the material itself.
Our 'Getting Started with Deep Learning' course here at HTC is based on fastai. Fastai sits on top of PyTorch. You may recall that i mentioned that we originally thought about putting the HTC course together based on working with Keras, which sits on top of Tensor Flow. This is the exact approach the MIT Bootcamp took.
You can access all of the course material (lectures, slides, lab sessions) associated with the MIT Deep Learning Bootcamp here.
1: Since you will be working with fastai in the HTC course examples, it's not crucial to spend too much time diving into the specifics of the Tensorflow Keras code in the MIT bootcamp examples unless you are really interested. But you should try and get a sense of why we think fastai is a better coding approach. Both approaches are trying to do the same task.
Why is fastai easier to use? Hint, it has to do with how fastai refactored things to try and make the structure of a deep learning program easier to code (and hopefully understand).
2: Activation Functions
Alexander shows a very informative slide in his lecture that shows 3 different activation functions.
Jeremy in the fastai lectures introduces the Rectified Linear Unit (RwLU) activation function, and then really just focuses on using it for the fastai coding because it's easy to understand, is easy to implement, runs fast, and it works great.
But if you had proposed using the ReLU activation function in a neural net course in the 80's or 90's you would have probably failed the course, or at least gotten a stern lecture about how the function is not differentiable and is therefore unsuitable for use in a system based on back-propagation. People typically used one of the first 2 options in the slide above, which are both mathematically much more complex then super simple ReLU, but are differentiable at all points in the function.
Then in the 2000's people came to the realization that the 'incorrect' ReLU activation actually not only worked, but oftentimes actually worked better in practice then the more complex but mathematically correct activation functions. Surprise. This is a really good example of how pre-existing conceptions in a particular field of knowledge can act to hold it back.
Another example of this phenomena is covered by Jeremy in the fastai lectures when he talks about the history of deep learning. The Minsky-Papert book 'Perceptrons' acted to hold back developments in the deep learning field for many years by incorrectly convincing many people that neural networks could not solve certain kinds of non-linear mathematical problems. They were only referring to 2 layer networks (but most people missed this caveat). Higher numbers of layers allows the solution of any non-linear mathematical function.
A similar thing happened in the 90's. People knew that 3 layer neural nets could solve any function. But working with more than 3 layers was considered to hard (or too computationally complex for the hardware available at the time). The current revolution in deep learning neural nets is happening because people learned that using more layers in these models was very important, and the computer hardware advanced enough (GPU) to allow people to train these deeper models in a reasonable time.
3: Tricks for training neural nets
The lecture briefly covers gradient descent, how it works (tries to descend to find the lowest point on a function), learning rates (how quickly we try to descend the function), batch learning (helps to jiggle the descent path to get over or around local minima in the function we are descending), etc.
Jeremy also covers all of this great stuff in the fastai lectures, and shows you how to do it with specific fastai code examples. And the cool thing about the fastai library is that usually all of the various options are already built into the api, so you just set the appropriate variable for a function if you want to change it, or just use the default parameter option if you just want to get things done and not pay so close attention to the underlying details of how it works..