HTC Education Series: Getting Started with Deep Learning - Lesson 2
Ok, let's dive into the second fastai lecture in our Getting Started with Deep Learning series. We're going to continue our exploration of what you can do with deep learning systems, and start to better understand how they work.
You can also watch the video on the fastai course site here. The advantage of that is that you can access on that site searchable transcript, interactive notebooks, setup guides, questionnaires, etc.
As i mentioned in HTC Lesson 1, a good thing to do is to try and summarize what you learned after watching a particular lecture by putting together a bullet point list of everything that was covered in the lecture. I'm going to do that below. I would suggest that you try to do it for yourself before digging into my bullet points below.
What is covered in this lecture
training data - validation data - (keep some additional validation data hidden until very end)
Loss function - measure of performance to make the learning system work better
Things you might want to do with deep learning that aren't vision related
audio - implement by using spectrogram as image input
present path movement data to deep learning system as image input
virus screening - again presents the data as image input
Different kinds of data input to deep learning systems
vision, text, tabular, recommendation systems, multi-modal
Covid-19 data modeling example
does temp and humidity effect transmission rate?
Drive train approach to building data modeling products
Example of how to make your own dataset
bing image search
bear detectors -grizzly, black, teddy
Quick look at how DataBlock api works
Using your model training to help clean up your data
Finishes chapter 1 of book
Don't forget to read the book!
Additional HTC Course Material
1: Feature Visualization is a fascinating topic that deserves to be covered in more depth. And it will be in the HTC course you are taking. We have another post that covers Feature Learning in much more detail here. And we also present the associated lecture inline in this lesson posts below.
This additional lecture post features a very dynamic short lecture presenter named Xander that we will be using throughout the HTC course for presenting some extra material. Think of Xander as the guy at the gym in your spin class that 'gets you pumped up' about what you are doing.
If you read any popular science or tech overview article in the popular press (New York Times for example) on some aspect of deep learning or AI, at some point it will typically say something about how 'deep neural nets are black boxes and people have no idea how they work or what is going on inside of them'. This is not true at all, it's a huge misconception.
Understanding feature visualization is a key component of understanding how deep neural nets work. It's also a key component in understanding how to make deep learning systems user slider adjustable. This is something the HTC course will spend some time covering in more detail in future lessons and posts.
1: The specific fastai lectures we are using in the first part of this HTC course come from the 2020 Part 1 fastai lectures. So they were put together by fastai during the beginning of the covid-19 global pandemic. It's interesting to see how covid-19 has infected the 2020 fastai course lectures. This is not necessarily a bad thing, but it's something to be aware of.
So in this fatai lecture 2, a really significant chunk of the lecture dives deep into a very specific covid-19 related data modeling topic. This section of the lecture is a great way to get a feel for how a world class data modeling expert (Jeremy) analyzes a highly relevant data modeling problem. That problem being whether temperature and humidity effect covid-19 transmission rates.
In some sense this part of the lecture has nothing directly to do with deep learning. But it is fascinating, and certainly highly relevant to the world we are currently all living in.
If you look at course lectures in previous years, some of the individual lectures usually bend to include some state of the art research that happened a few days before that lecture. Which is really great.
I have to be honest, a part of me wishes that the 2020 lectures had just bent to cover the latest and greatest AI developments. But as i said, we all live in the global pandemic world this year's lectures were created in.
It also seems like Jeremy is including some additional material in the 2020 fastai lectures that really comes from another fastai course they previously taught on more general machine learning techniques. So the whole P-value, Null Hypothesis part of the lecture i think was also added from this perspective of trying to incorporate some of this more general machine learning stuff into the specific deep learning course track itself.
Again, this is just something to be aware of as you watch the complete set of fastai lectures in this course.
2: Jeremy tells you to use your model training to help clean up bad data in your training set.
This is the exact opposite of what most deep learning course will tell you to do. They will tell you to spend a lot of time examining and cleaning up your data before you even start training. And if you don't do this, you will get bad results from the system, and then spend forever trying to figure out why that is the case.
The fastai approach is always very practical. So what they are pointing out is something very valuable. Ultimately you want to have a clean dataset for your final training, but why not use your model itself to help you find the bad data samples in the training set early on. And indeed they show you exactly how to do that.
3: The fastai DataBlock api and associated DataLoader objects are really great. They take what is normally a very tedious programming task (getting the data in a data collection into the specific format that a specific model needs for training input), and hide all of that annoying busy work from you.
1- What is Refactoring?
Jeremy talks a lot about the concept of refactoring throughout the course. He's specifically referring to refactoring computer code.
Refactoring is all about looking for repetitive patterns, defining the pattern once, and then using your defined reference for the pattern when you need to use the pattern (as opposed to redoing all of it's individual component steps explicitly one by one each time you want to use the pattern).
Notice that i tried to describe the concept very generally above. In the context of the course we're usually referring to refactoring code. But the concept is much more general and could be applied to all kinds of different situations to help more concisely organize what you are doing. In a way that makes future work easier to do, and helps avoid errors or bugs in the system you build that come from incorrect usage of some part of the steps needed to implement the pattern .
Need to review something from the previous lesson in the course.
You can access Lesson 1 here.
You can move onto the next Lesson 3 here.