Visual Transfer Learning

So we're going to be talking about Affordance-Based Manipulation. And how robots can learn to perceive it.  Using deep learning neural networks.

What is it (Affordance-Based Manipulation)?  What a robot can or cannot do with an object.  That awareness.

And we've been talking this week about Transfer Learning. So 'hot' a topic that it appears on Google AI Blog in many different recent scenarios. For example:

In “BLEURT: Learning Robust Metrics for Text Generation” (presented during ACL 2020), we introduce a novel automatic metric that delivers ratings that are robust and reach an unprecedented level of quality, much closer to human annotation. BLEURT (Bilingual Evaluation Understudy with Representations from Transformers) builds upon recent advances in transfer learning to capture widespread linguistic phenomena, such as paraphrasing. The metric is available on Github.

Bit and BERT Pre-trained Computer Vision Models - Transfer Learning
Following the methods established in the language domain by BERT, we fine-tune the pre-trained BiT model on data from a variety of “downstream” tasks of interest, which may come with very little labeled data. Because the pre-trained model already comes with a good understanding of the visual world, this simple strategy works remarkably well. 

But the most recent scenario in particular we are going to focus on is the work on Visual Transfer Learning for Robotic Manipulation.

So the goal is to see if deep learning neural nets trained to perform different computer vision tasks can be utilized to help train robotics vision systems.  Specifically can we improve the efficiency of learning robotic manipulation tasks.  So we're talking about a system that can learn to pick up and grasp arbitrary objects in unstructured settings in less than 10 minutes of trail and error experimentation.

One thing i really like about this work is that it is examining the use of neural 2-D activation maps, where a vector value in a 2-D 'map' image can represent the perceived direction of action. So you're using the 2-D map to index into directional 'action' sequences. So we are turning robotic movement planing into 2-D movement maps generated by the neural net system.

The results of this study showed that transfer learning could improve robotic exploration.

There's a long history of 3-D modeling vs 2-D neural map modeling for computational models of how the brain perceives 3D shape in the real world, or how it recognizes rotational manipulation of 3-D objects, etc.

And i guess we are jumping firmly into the fray in the second camp with this post.

It would be interesting to take some of the ideas discussed in yesterday's HTC Seminar Talk on Deep Learning Cognition and apply them to this notion of learning activation maps for controlling visual manipulation.  Specifically the notions of learning things happening over time (and building your neural net to be able to deal with that), and the notion of visual attention being included in the system.

I'm also unclear if there has been a lot of work training deep learning neural nets on activation maps from the parts of the brain associated with planning and implementing movement.  So training the model on real neural data.  Because whatever multi-dimensional statistical manifold they encode might be relevant to transfer learning for robotic vision tasks.

Maybe more suited to certain tasks then nets trained on object recognition data? Or not?  Would be interesting to examine it.  Is there a difference? 

Object recognition nets build up increasingly more elaborate visual representations of the data they are trained on. Maybe there is something similar encoded in the neural movement mapping net.  More attuned to movement.


Post a Comment

Popular posts from this blog

Pix2Pix: a GAN architecture for image to image transformation

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Smart Fabrics