LSTM is Dead, Long Live Transformers
This is a talk from Dec 2019 on how LSTM models for natural language processing have been replaced by transformer based models. It was referenced in the Stanford intro to transformers talk in yesterdays post, and is a great source of additional introductory information on the basics of the transformer architecture.
The speaker gives a good overview of the history of natural language processing, from bag of words to recurrent neural nets (RNN) to the LSTM architecture that supplemented RNNs to how transformers then supplemented both RNNs and LSTM. He also gives an overview of transformer architecture specifics like multi-headed self attention and positional encoding.