Let's Code a Transformer Network in Pytorch
The state of the art in deep learning and AI is always an ever moving, ever accelerating target. So things change, and you need to be aware of them and adapt. It seems obvious that everyone in 2021 needs to bone up on Transformer architectures for Deep Learning. So there's your new year's resolution.
And to start off an apparently endless new series of posts, let's dive right into it. Let's get to the point. We'll watch someone code up a Transformer network in PyTorch from scratch. And we'll learn about some cool new PyTorch calls you may not be familiar with yet, but will be glad you know about afterwards.
The 'Attention is all you Need' paper is here.
The blog post on Transformers mentioned in the video is here.
1. Note the computational complexity of the standard Transformer architecture is not the best thing as the size increases. Some people recently have offered up architecture variations that alleviate this problem. So these new variations of the architecture may soon capture the popular imagination (rendering all this old fashioned even though it is new). We'll be posting about that soon.
2. So how about that torch.einsum? Pretty slick.