### HTC Education Series: Getting Started with Deep Learning - Lesson 7

Another exciting lesson in HTC's 'Getting Started with Deep Learning' course begins. Following our usual presentation layout of our lesson's material, we will start with the fastai Part1 2020 Lesson 7 lecture. This lecture continues the extremely fascinating discussion started last week about how to generate a latent space associated with a collaborative filtering model.

We then move onto the topic of tabular data.

We will take a diversion into a random forest, where we will learn more about random forests than one might have expected. This being a lead up to the notion of using neural networks for tabular data.

You can also watch this first fastai video lecture on the fastai course site here. The advantage of that is that you can access on that site a searchable transcript, interactive notebooks, setup guides, questionnaires, etc.

__What is covered in this lecture?__

regularization

weight decay - L2 regularization

*we need to wrap with nn.Parameter() in PyTorch to make it learnable tensor*

50 dimensional vector space mapped into 3D vector space

distance in the reduced dimensionality latent space is useful, has meaning

cosine similarity (dot product) - used to find movies that are similar

neural net version of collaborative filtering is implemented using a fastai tabular neural model

categorical vs continuous data

can think of 'one hot encodings' as embeddings

the models can learn something about what Germany looks like just by looking at the purchasing behavior of people who live there.

model can learn information about the world the data was generated in (via the embedded features)

collaborative filtering is just 2 vectors?

deep learning neural net approach to tabular data allows N vectors (not restricted to just 2)

alternate older approaches to dealing with tabular data

ensembles of decision trees

random forests

gradient boosting machines

Scikit-learn and Pandas libraries

popular for tabular data, can use to do random forest, decision tree stuff (which don't really use PyTorch math acceleration features)

what is an automatic procedure that generates decision tree that does better than random choices?

shows how to create a random forest algorithm from scratch

out-of-bag error (OOB)

model interpolation - 5 step analysis procedure

allows removal of non relevant columns of data in model

allows removal of redundant features

Random forests can't predict future data (they can't extrapolate to predict future trends)

deep learning models can do extrapolation

Random Forest extremely popular, and often gives fairly good performance

so you should test against to make sure you aren't worse

Boosting

Entity embeddings can improve existing methods if combined together

__Additional HTC Course Material__

1. We've been heavily emphasizing Feature Visualization of deep learning neural networks as being very important concept. Both for understanding how deep learning systems really work, and for driving future developments in deep learning image and computer vision processing.

__Observations__

1. If you are like me, at some point in this fastai lecture you started to get a little bit confused. Because we thought we were in a deep learning course, but seemed to have traveled by quantum fluctuation into a parallel universe AI course covering random forest data modeling, then continued on into an in-depth discussion of decision tree methods in general. And then into a discussion of data bagging.

*Bagging is kind of extraordinary, since it provides a way to improve the accuracy of nearly any kind of machine learning algorithm by training it multiple times, on different random subsets of data, and then averaging the predictions*.

Random Forests do not predict time series data into the future. Deep learning neural nets can do this. So this is an important distinction to understand.

3: I can not stress enough how important and exciting understanding deep learning neural net feature visualization is. Developments in this area are going to hugely influence future research directions, certainly in the imaging space of deep learning applications.

*(as well as see amazing deep learning visualization imagery).*

__Don't forget to read the course book__

Chapter 9

__Need to review something from the previous lessons in the course.__

No problem.

You can access Lesson 1 here.

You can access Lesson 2 here.

You can access Lesson 3 here.

You can access Lesson 4 here.

You can access Lesson 5 here.

You can access Lesson 6 here.

You can move on to the next Lesson 8 in the course (when it posts on 11/16/20).

## Comments

## Post a Comment