OpenAI Clip - Connecting Text and Images

 So the big news in deep learning AI this last week was the announcement of OpenAI's DALL-E and the associated companion work on the CLIP algorithm.  We already have one post on DALL-E, which is a generative model architecture for creating an image from a textural description.

CLIP is a deep learning model with a contrastive objective function that generates a textural description of what is in an image.That is pretty slick in itself.  But the resulting model can be turned into an arbitrary zero-shot classifier for new tasks.  It's like transfer learning, but slightly different.

Yannic Kilcher gives us the lowdown on the CLIP algorithm. Let's check it out.

Here's a link to the 'Learning Transferable Visual Models from Natural Language Supervision' paper.

Here's a link to the PyTorch CLIP code.


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation