Emerging Properties in Self-Supervised Vision Transformers

Self-Supervised Learning is the final frontier in Representation Learning: Getting useful features without any labels. 

Facebook AI's new system, DINO, combines advances in Self-Supervised Learning for Computer Vision with the new Vision Transformer (ViT) architecture and achieves impressive results without any labels. Attention maps can be directly interpreted as segmentation maps, and the obtained representations can be used for image retrieval and zero-shot k-nearest neighbor classifiers (KNNs).

You can find the paper here.
There is a blog post with more info here.
The PyTorch code can be found here.

Yannic Kilcher will run us through his astute analysis of the system.


Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics