OpenAI DALL-E - Creating Images from Text

Here's some up to the minute information on the latest deep learning generative model architecture from OpenAI. It's a transformer architecture based off of GPT-3 that allows the generation of high quality images from a textural description.  

We will first turn to a very well put together and extremely timely video tutorial by Yannic Kilcher .  We also have links to the 2 OpenAI blog posts as well.  And of course some HTC observations.


So like Yannic says in the video tutorial, some of his commentary is somewhat speculative since there is not a paper to reference for the specific implementation details.  Although i think he did an excellent job of explaining what is presented in the blog posts.

Speaking of the blog posts, here they are.

'DALL-E: Creating Images from Text is here.

The reference section in this blog post has some good links to other generative 'text to image' arooraches to check out.

'CLIP: Connecting Text and Images' is here.

Paper and code links in the blog post.


Observations

1: The VQ-VAE model architecture is a part of our upcoming Generative Model Deep Dive post (it's a part ofJustin Johnson's second lecture features in part 2).

2:  As a texture synthesis model, this could be pretty slick.  Not that it isn't already.

3:  Obviously we need to get on posting our upcoming coverage of transformer models.  Including transformer models for image processing.  I would not have anticipated auto-regressive models doing so well (as contrasted with other generative approaches).

4:  I'm sure there is a way to frame this within Yann LeCun's grand unified theory of generative models.  Worth thinking about because i'm sure there are ways to tweak it to potentially speed it up and make it more flexible for artists.

5:  No code for DALL-E, there is code for CLIP.  Would be a shame if OpenAI just sells this to Microsoft like they did GPT-3.  Although i'm sure we'll see a cast of thousands post their implementations of it soon on GitHub, so we await that anxiously.

6:  I was not aware of that ImageNet Sketch dataset.  Interesting.


Comments

Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics