HTC Education Series: Getting Started with Deep Learning - Lesson 7
Another exciting lesson in HTC's 'Getting Started with Deep Learning' course begins. Following our usual presentation layout of our lesson's material, we will start with the fastai Part1 2020 Lesson 7 lecture. This lecture continues the extremely fascinating discussion started last week about how to generate a latent space associated with a collaborative filtering model.
We then move onto the topic of tabular data.
We will take a diversion into a random forest, where we will learn more about random forests than one might have expected. This being a lead up to the notion of using neural networks for tabular data.
You can also watch this first fastai video lecture on the fastai course site here. The advantage of that is that you can access on that site a searchable transcript, interactive notebooks, setup guides, questionnaires, etc.
regularization
weight decay - L2 regularization
we need to wrap with nn.Parameter() in PyTorch to make it learnable tensor
50 dimensional vector space mapped into 3D vector space
distance in the reduced dimensionality latent space is useful, has meaning
cosine similarity (dot product) - used to find movies that are similar
neural net version of collaborative filtering is implemented using a fastai tabular neural model
categorical vs continuous data
can think of 'one hot encodings' as embeddings
the models can learn something about what Germany looks like just by looking at the purchasing behavior of people who live there.
model can learn information about the world the data was generated in (via the embedded features)
collaborative filtering is just 2 vectors?
deep learning neural net approach to tabular data allows N vectors (not restricted to just 2)
alternate older approaches to dealing with tabular data
ensembles of decision trees
random forests
gradient boosting machines
Scikit-learn and Pandas libraries
popular for tabular data, can use to do random forest, decision tree stuff (which don't really use PyTorch math acceleration features)
what is an automatic procedure that generates decision tree that does better than random choices?
shows how to create a random forest algorithm from scratch
out-of-bag error (OOB)
model interpolation - 5 step analysis procedure
allows removal of non relevant columns of data in model
allows removal of redundant features
Random forests can't predict future data (they can't extrapolate to predict future trends)
deep learning models can do extrapolation
Random Forest extremely popular, and often gives fairly good performance
so you should test against to make sure you aren't worse
Boosting
Entity embeddings can improve existing methods if combined together
Additional HTC Course Material
1. We've been heavily emphasizing Feature Visualization of deep learning neural networks as being very important concept. Both for understanding how deep learning systems really work, and for driving future developments in deep learning image and computer vision processing.
Observations
1. If you are like me, at some point in this fastai lecture you started to get a little bit confused. Because we thought we were in a deep learning course, but seemed to have traveled by quantum fluctuation into a parallel universe AI course covering random forest data modeling, then continued on into an in-depth discussion of decision tree methods in general. And then into a discussion of data bagging.
Bagging is kind of extraordinary, since it provides a way to improve the accuracy of nearly any kind of machine learning algorithm by training it multiple times, on different random subsets of data, and then averaging the predictions.
Random Forests do not predict time series data into the future. Deep learning neural nets can do this. So this is an important distinction to understand.
3: I can not stress enough how important and exciting understanding deep learning neural net feature visualization is. Developments in this area are going to hugely influence future research directions, certainly in the imaging space of deep learning applications.
Don't forget to read the course book
Chapter 9
Need to review something from the previous lessons in the course.
No problem.
You can access Lesson 1 here.
You can access Lesson 2 here.
You can access Lesson 3 here.
You can access Lesson 4 here.
You can access Lesson 5 here.
You can access Lesson 6 here.
You can move on to the next Lesson 8 in the course (when it posts on 11/16/20).
Comments
Post a Comment