Datasets for Understanding the 3D World, Introducing the Objectron Dataset.

 Understanding objects in 3D space is a challenging task.  Part of the problem is that many of the existing real world datasets one can use to training deep learning nets are based on 2d images.  There needs to be more datasets that are focused on capturing the 3d structure of objects.  At the same time, you'd like to organize this data so that it can easily be used as input to machine learning algorithms (like deep learning neural nets). 

One approach to doing this is to create object-centric video clips, and then use them to build your dataset.  That is the objective of Google's new Objectron Dataset.  You can read all about it here.

Objectron is a collection of short video clips, that capture object views from different angles.  Each video clip is accompanied by AR session metadata that includes camera poses and sparse point-clouds. The data also contain manually annotated 3D bounding boxes for each object, which describe the object’s position, orientation, and dimensions. The dataset consists of 15K annotated video clips supplemented with over 4M annotated images collected from a geo-diverse sample (covering 10 countries across five continents).

The addition of AR session metadata is particularly exciting.  Modern digital cameras are quickly  moving to include depth perception in addition to conventional RGB sensing of color information in an image.  So an image could consist of RGB color information, and then an additional depth image.  

Typically the depth information is sparser then the RGB information.  You can think of the depth image in an alternate viewpoint as it being depth information in a raster image.  You can also think of the individual pixels in the depth image as points in a 3D cloud of data.  You can also associate colors with them, so then what your camera is really doing is constructing a 3d point cloud with both depth and color information (if you want it to).  

What you can do with a 3d point cloud as far as data manipulation goes (especially as you move the camera around in a scene), is very exciting.  It's a whole new way to think about digital photography.  And will become very important as AR (augmented reality) applications take off over the next few years.

We have a previous HTC post on how LIDAR can cause us to rethink how a digital camera works.  For example, the new iPad pro models, and the new iPhone Pro models usLIDAR to record depth information.

The OAK boards we have discussed in previous posts also work this way (higher resolution color information, lower resolution depth information).

Google's hopes in releasing the Objectron dataset is to encourage the research community to really push the limits of what can be done with 3d object recognition (and geometric understanding).  Note that this dataset is object focused.  But one could imagine other datasets that are focused on understanding a 3d scene as the camera moves within it (non object focused, scene focused instead.

You can access and use this dataset now on the Objectron GitHub page.


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation