GLOM model deep dive

 GLOM is a computer vision model proposed by Geoffrey Hinto to decompose an image into a 'parse tree' like structure to decompose and represent objects and their parts.  the parse tree is constructed dynamically, without changing the underlying neural net structure.

It's a followup to Hinton's Capsule architecture to try and address some of it's limitations.

Our previous HTC Seminar featured a lecture by Hinton that covers this material.  So you should watch that first.

You can read the GLOM paper by Hinton here.

You can snark at reddit snarks here.


Let's start off with our old fried Yannic Kilcher's deconstruction of the paper.


So how does this all relate to visual modeling studies?

How does this all relate to scale space representations?

How does this all relate to manifold learning?

Comments

Popular posts from this blog

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation

Smart Fabrics