HTC Education Series: Getting Started with Deep Learning - Lesson 8

 This lesson starts off with a great lecture by Jeremy of fastai on Natural Language Processing (NLP).  Using deep learning nets and the fastai api.

So we'll be covering tokenization of text data.  We'll be doing a deep dive into how to code recurrent neural nets (RNN) using the fastai api.  We'll cover methods like LSTM which were developed to prevent exploding gradients in RNNs.


You can also watch this first fastai video lecture on the fastai course site here. The advantage of that is that you can access on that site a searchable transcript, interactive notebooks, setup guides, questionnaires, etc.

What is covered in this lecture?

Natural language processing with fastai api
    recurrent neural network

The advantages of starting with a pre-trained model
    wikipedia text language model

Fine tune the pre-trained model with a model more directly related to your desired task
    IMDb movie review text model

Text Pre-processing
    word , sub-word,  character based

Putting text into batches for a language model
    mini batches
    fitting mini batches into GPU memory 

Disinformation and NLP deep learning models
     social network content generated by AI Bots for sinister purposes (bad people do bad things)

Building a language model from scratch
    recurrent neural nets
        refactored version of fully connected deep model (back prop through time (BPTT))
            hidden states are the activation inside of the refactored mode's loop
            saving the state makes the loop infinitely long, so detach (truncated backprop)
        multilayer RNNs (use different weight matrix for each loop layer)
            hard to train (exploding or disappearing gradients)

LSTM (Long Short-Term Memory)
    solves the multilayer RNN gradient problem above)
    AWD-LSTD -use regularization to improve  (dropout), (weight decay), (weight tied)


Additional HTC Course Material
1. Our second lecture in this HTC Lesson 7 is presented by Ava Soleimany, and was a part of MIT's Deep Learning Bootcamp in January 2020. It will introduce you to the new kind of deep learning neural net architecture discussed in this lesson, a Recurrent Neural Network. RNNs are specifically designed to deal with processing sequences of data.  Like text, or audio.




The fastai lecture above in HTC Lesson 8 is specifically focused on looking at Recurrent Neural Networks RNN architectures. 
How they work, how to build them, and specifically how to build them using the fastai api. 
So Ava's lecture today is a good alternative introductory viewpoint into this same material.

Both presentations discuss how to deal with issues like exploding gradients in recurrent networks. And both presentations cover the LSTM architecture (what it does and how to code it up).



2: We do have some additional HTC posts tagged with recurrent neural net. including a lecture and some great blog posts by Andrej Karpathy.

We also have numerous GPT-2 and GPT-3 tagged articles. And need to get a good overview post up of the Transformer architecture used to such prominence in the GPT posts.



Observations
1. Different NLP (natural language processing) architectures deal with (or totally ignore) tokenization of their input streams. And it's an interesting questions, what constraints (if any) do you put on the input to NLP systems (and RNNs in general)?

Note that fastai takes a very different viewpoint towards tokenization as compared to other systems like OpenAI's GPT Transformer based NLP architectures. What is the difference in how they approach it?

2. Being able to model and then regenerate long term structural interactions is an issue in RNN systems.  Look at how structure is encoded and utilized in a piece of music like a pop song, or in writing like a novel.  In both scenarios, there are statistical correlations related to stylistic elements that occur over fairly long periods of time. In addition to being associated with a hierarchy of information. 

Deep Learning RNN systems need to be able to deal with this level of long term nuanced structural information detail in their data.

A classic example of a system failing at this task would be deep learning based music generators that can capture some semblance of style i their output, but don't understand verse-chorus-verse-chorus- middle bit-verse-chorus levels of long term structure in music.

3: Jeremy answered a question in the fastai lecture about how you do data augmentation for NLP by pointing at a paper called 'Unsupervised Data Augmentation for Consistency Training'. Here's a link to the paper.

4: Jeremy points out in the second half of the lecture that it's often useful to create your own limited datasets for various stages of a project to aid you.  He showed off a human number database (just counts up one, two, three, etc).  If you think about it, you can see how useful this would be to help debug things like mini batch code.  Or for building and debugging RNN code from scratch.

This is really useful information to keep in mind as you move forward using fastai api to approach your own particular domain specific projects of interest.

5: When Jeremy is discussing Regularization in LSTM, he mentions Geoffrey Hinton's dropout paper called 'Improving Neural Networks by Preventing Co-adaption of Feature Detectors'.  Here's a link to that paper.

6: You probably noticed that Jeremy thanked you at the end of the lecture for watching Part 1 of the fastai course.  Part 2 is not out yet, since part 1 was just released 3 months ago (as i write this).

However, the HTC course is going to continue for a little bit more, before we take a breather and wait for the fastai v2 part2 lectures to come out.  There are a number of interesting topics i would like us to cover. So stay tuned for the next HTC lesson, where we take a look at GANs (Generative Adversarial Networks) implemented using fastai api.

Here's a link to the HTC GAN posts if you are curious about why we thinks GANs are super cool.


Don't forget to read the course book

Chapter 10

Need to review something from the previous lessons in the course.
No problem.

You can access Lesson 1 here.

You can access Lesson 2 here.

You can access Lesson 3 here.

You can access Lesson 4 here.

You can access Lesson 5 here.

You can access Lesson 6 here.

You can access Lesson 7 here.

You can move on to the next Lesson 9 in the course (when it posts on 11/23/20).


Comments

Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics