Background Matte Generation Using Novel Neural Architecture

 Generating background mattes for video is something that tends to be used for all kinds of different applications.  There is a long history of different techniques for this, like color key which assumes the foreground objects or people are in front of a flat color background (think green screen).  Or one of the many Siggraph papers over the years focused on different variants of this (graph-cut, etc).

With the deep learning neural net renaissance currently taking place, you would expect to see different neural net implementations of this.  Nvidia recently announced an implementation for example. And now the U of Washington graphics lab has come up with an interesting neural net based approach.  Take a look at the video examples below.

The neural net architecture they use is interesting, and if you stretch your brain you might start to come up with other applications of the basic idea. Two different neural nets are used.  A low res one to get in the ballpark, and then a high res one that is based on patches (and we've seen that idea before in generative models) that refines detail in hard to matte areas.

You do need an image of the background without the foreground objects for thi to work. So my question is whether you could manually cut the foreground out, then use a different neural net algorithm to in-paint, then use the in-painted image as the background image for the matting algorithm.

You can access the project page here.

There is a github page with code to check out here.

And of course there is a paper here.

When you read the paper you will note the skip connections in the encoder-decoder stages, which is a theme in so much of the recent generative nets we have featured here at HTC.  They based the backbone of the encoder on ResNet-50.

The decoder used bilinear up-sampling for each 'build it back up' step, then concatenate with skip connection from encoder, followed by 3x3 convolutional layer, batch normalization layer, and ReLU.


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation