OpenAI Clip - Connecting Text and Images
So the big news in deep learning AI this last week was the announcement of OpenAI's DALL-E and the associated companion work on the CLIP algorithm. We already have one post on DALL-E, which is a generative model architecture for creating an image from a textural description.
CLIP is a deep learning model with a contrastive objective function that generates a textural description of what is in an image.That is pretty slick in itself. But the resulting model can be turned into an arbitrary zero-shot classifier for new tasks. It's like transfer learning, but slightly different.
Yannic Kilcher gives us the lowdown on the CLIP algorithm. Let's check it out.
Here's a link to the 'Learning Transferable Visual Models from Natural Language Supervision' paper.
Here's a link to the PyTorch CLIP code.