OpenCV AIKit (OAK)

HTC is back for summer session. Lot's of fun high-tech developments and research to cover before we need to lock ourselves in the code dungeon again.

And the first thing to focus on is a very new in-development open source inexpensive neural net AI embedded hardware platform.  Introducing the OAK, specifically OAK-1 and OAK-D.

These 2 very inexpensive OAK hardware boards are tiny artificial intelligence (AI) and computer vision (CV) powerhouses, with OAK-D providing spatial AI leveraging stereo depth in addition to the 4K/30 12MP camera that both models share. They are supposedly absurdly easy to use, up and running in under 30 seconds. You can program your OAK's by just plugging them into your Mac or Ubuntu or Windows computer's USB port. OAK's modular, FCC/CE-approved, open-source hardware ecosystem can also be directly integrated into your products if you want to use them as embedded hardware in systems that you sell.

OAK is currently a brand new kickstarter program.  Funded in the first 20 minutes no less, and with current pledges continuing to climb it is one of the most successful embedded hardware boards ever crowd funded. Here's an interview with OAK's creator Brandon Gilles to reassure everyone that you are in good hands with this particular kickstarter project (not always the case).

Here's some more information about what OAK is all about. OAK comes in 2 different versions

OAK-1: The standard OpenCV AI board that can perform neural network inference, object tracking, April Tags recognition, feature detection, and basic image processing operations.

OAK-D: Everything in OAK-1, but with a stereo depth camera, 3D object localization, and tracking objects in a 3D space.

You can think of these 2 boards as very smart cameras.  But smart cameras that return structured data, data that be much more sophisticated then just pixel data, more like intelligent metadata.  You can certainly use them to capture 2 or 3D images or videos. But you can also use them to tell you things like the position and name categories of different objects in the scene they are viewing (again 2D or 3D positions), or where all the strawberries are in physical space it is viewing, their approximate ripeness, types of insect damage they have, etc.  They can also tell you if someone in the scene is wearing a Covid-19 mask or not.

You can program the OAKs in C++ or Python.

The color camera resolution is 4056 x 3040 pixels.
The depth camera resolution is 1280 x 720 pixels.  There are 2 image sensors (left and right view) in the depth camera on OAK-D.

Note: You would currently have to program the calibration mapping to map the depth image into the higher res color camera image. They have plans to do this at some point and include the calibration data in the OAK's onboard EEPROM memory. It's apparently a little bit tricky right now due to the auto-focus in the color camera.

The left or right depth camera views can be output at full frame rate (optionally H264 or H265 coded if you want that rather then raw pixels).

The OAKs are based on Myraid X,  Intel's Movidius Myriad X Vision Processing Unit.  You may recall that we discussed the Intel Movidius system before in our previous look at embedded deployment of deep learning neural net systems.  Brandon makes an interesting point in the interview article that the usb stick solution we looked at previously only uses 1/7 of the Movidius's computer capability, and that this was why Brandon looked into developing a different more efficient solution.

The design goals for OAK include:
- Real-time neural inference (covering nearly every type of network)
- Stereo depth
- Feature tracking including Motion Estimation and platform-motion estimation
- Object tracking
- H.265/H.264 encoding (for recording incidents and/or generally just recording)
- JPEG encoding
- 16x efficient vector processors for combining these capabilities together and running accelerated CV functions (think of these as link an embedded vision-specific GPU)

And now you get a sense of why this kickstarter program was funded so quickly.  Truly a very smart camera.


Popular posts from this blog

Simulating the Universe with Machine Learning

CycleGAN: a GAN architecture for learning unpaired image to image transformations

Pix2Pix: a GAN architecture for image to image transformation