Perceiver: General Perception with Iterative Attention

 Today's video is an analysis by a new transformer architecture paper put out by a group of people at DeepMind.  And certainly Andrew Zisserman's name has been attached to really great papers for a very long time, and this one is interesting as well.

They restructure the transformer architecture a little bit to reduce the computational complexity as your data size increases.  They also define a uniform blank slate architecture that can be used for different tasks (vision, audio, 3d point clouds, text, etc).

And with that intro we turn to our old pal Yannic Kilcher to give us his astute analysis of the paper.


You can check out the paper titled 'Perceiver: General Perception with Iterative Attention' here.


Observations:

1:  Is the 'fourier' style position encoding really just an elaborate way to build a scale space pyramid, encoding, whatever you want to call it of the input data?

2: His comments about it being a recursive neural net if the weight encodings of the different horizontal stages in the block diagram he details in the video (also in the paper) is fascinating.

Comments

Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics