The Rise of the Transformer Architecture
Todays presentation is the transformer lecture from the Berkeley full stack deep learning course (which is quite excellent). They title it 'Transfer Learning and Transformers', but the transfer learning part is really just a few minutes at the very beginning that is really just a lead in to their view of the events leading up to transformer architectures.
So we quickly mention transfer learning, then move on to vector embedding models of language, then on to the real meat of the lecture, the transformer architecture. We recently covered the OpenAI Codex automated coding system, and they talk about how that fell out of this transformer research on natural language modeling. They also briefly talk about ethic and ai systems like GPT-3 or Codex.
This lecture is from the spring 2021 Full Stack Deep Learning online course.
HTC has a number of blog posts on transformers you can check out here, including a full tutorial on how to code one in PyTorch.
1: The lecture mentions at one point how architectures prior to transformers didn't really consider the original word relationships except in the first layer. Which got me thinking about U-Nets, which is the magic architecture for image processing, and what the NLP equivalent would be for that architecture. We;ll leave that thought for the reader to puzzle out.