Depth Sensing

Most of the apps we have built up to now are using video as input. Video is two-dimensional, it has a width and a height.

Depth sensors are special types of cameras that add a third dimension to the mix; they can see width, height, and depth. Depth in this context means the distance away from the camera where a captured pixel is situated.

It is important to note the difference between full 3D and depth. Just like a conventional 2D camera, a depth sensor can only capture what it “sees”. If an object is obstructing another one behind it, the camera will only be able to measure the depth of the one closest to it. We can only have one depth measurement per pixel.

*Metal Pin Art Pin Point Impression 3D Frame Toy*

MegaFaces: Kinetic Facade Shows Giant 3D 'Selfies' from iart on Vimeo.

Technologies

There are a few different technologies used for depth sensing, each with their own pros and cons depending on the application.

Most depth cameras include both color and depth sensors, providiing a pair of images per frame. These are then combined to form a colored point cloud as the result.

Coded (Structured) Light

Coded (or structured) light works by projecting a known pattern onto the world, then analyzing the deformations of the pattern to determine the shapes of the surfaces it is projected on.

Coded light technology and the Intel® RealSense™ Depth Camera SR305

Pros:
- Precise.
- High resolution, usually as high as the camera resolution it is using.
Cons:
- Can be slow as it needs multiple patterns projected then captured to generate a single frame.
- Subject to interference from outside light, so usually better for indoor sensing.

Time of Flight

Like the name implies, time of flight works by measuring the time it takes for a beam of light to do a round-trip to the sensor. This involves complex calculations using the speed of light, and the shorter the travel time, the nearer the object.

Pros:
- Compact. (This is the technology used in many current generation phones)
- Fast, and ideal for real-time processing.
Cons:
- Mid-level accuracy.
- Low resolution as it uses a custom sensor.
- Subject to interference from other devices.

Stereo Vision

Stereo vision works like human depth perception, where two cameras (or eyes) are placed side by side looking in the same direction. Differences in the two captured images, called disparity, is used to determine how far from the sensors these different pixels are.

Pros:
- Tends to be cheap, as any off the shelf cameras can be used.
- Image representation is intuitive to humans.
- No interference.
Cons:
- Low accuracy.
- Complex implementation requiring feature extraction and matching.

Microsoft Kinect

Microsoft was the first company to bring depth sensors to the mass market with the Kinect for Xbox 360 in 2010. This was originally designed as a game controller, but gained a lot of interest from robotics engineers and creative coders, who hacked the device to use for custom Desktop applications.

Microsoft originally was against the practice, even threatening legal action against the hackers, but eventually changed course and embraced the community building around it.

ofxKinect 3D draw 001 from Memo Akten on Vimeo.

Interactive Puppet Prototype with Xbox Kinect from Theo Watson on Vimeo.

Kinect for Xbox 360

Even though it was discontinued in 2013, this device is still used because of its long range and ease of use on all major OSes.

Technology: Coded light
Range: 1.2-3.5m / 3.9-11.5ft
Color resolution: 640x480px
Depth resolution: 640x480px
FOV: 57°x43°

Extra features:

Microphone
Motorized base to adjust device angle

OF Support

ofxKinect
- Windows, Mac, Linux
- README for setup instructions
- Based on libfreenect

Starfield from Lab212 on Vimeo.

Kinect V2 (aka Kinect for Xbox One)

Released in 2012, the second version of the Kinect is said to have 3x the fidelity of its predecessor and a 60% wider field of view.

While still originally labeled as an Xbox controller, Microsoft shipped a Windows SDK with the device, providing access to advanced features to Desktop applications.

This device was discontinued in 2017 but is also still widely used for long-term installations.

Technology: Time-of-flight
Range: 0.5-4.5m / 1.6-11.5ft
Color resolution: 1920x1080px
Depth resolution: 512x424px
FOV: 70°x60°

Extra features using Microsoft SDK:

Body tracking (up to 6 people)
Facial expression recognition
Hand gesture recognition
Heart rate tracking
Speech recognition

OF Support

ofxKinectV2
- Windows, Mac, Linux
- Based on libfreenect2
- Note that this does not support any Microsoft SDK features like Body Tracking
ofxKinectForWindows2
- Windows only!
- Requires Microsoft SDK

Parade - Dancing Shadow Sculptures from Dpt. on Vimeo.

Kinect for Azure

Released in 2019, the Kinect for Azure is based on technologies of the previous Kinect and the HoloLens. This is the first Kinect device marketed to developers at launch, and the first device to have an open-source SDK with official support for non-Windows platforms.

Technology: Time-of-flight
Range: 0.25-5.5m / 0.8-11.5ft (mode dependent)
Color resolution: Up to 3840x2160px
Depth resolution: Up to 1024x1024px
FOV: Up to 120°x120°

Extra features using Microsoft SDK:

Orientation sensors
360° microphone array
Body tracking (requires NVIDIA GPU)

OF Support

ofxAzureKinect
- Windows, Linux
- Requires Azure Kinect Sensor SDK

Intel RealSense

The Intel RealSense started off as a compact depth camera aimed at video conferencing, gesture-based interaction, and 3D scanning. The 200 series included many devices, both standalone and embedded in tablet computers and laptops.

The quality was not up-to-par with other depth cameras, and the original RealSense was rarely used for interactive installations.

In 2018, Intel released the 400 series devices, which were a major improvement on the previous generation. Low cost, small form-factor and portability make these devices a viable choice for many applications.

D415 / D435 / D455 / D457

Technology: Stereo IR
Range: 0.10-10m / 0.3-32ft (depends on conditions)
Color resolution: 1920x1080px
Depth resolution: 1280x720px
FOV: 65°x40° (D415), 87°x58° (D435, D455, D457)

Extra features:

USB powered
Orientation sensors (on some models)

OF Support

ofxLibRealSense2
- Mac (Windows, Linux in theory)
ofxRealSense2
- Windows (Mac, Linux in theory)

Other Options

Stereolabs ZED

Stereolabs are the newest players on this list, and are worth mentioning because of their high quality ZED 2 sensors.

Technology: Stereo Color (range is virtually unlimited)
Waterproof / dustproof options makes them great for outdoor use
SDK uses machine learning models to provide a robust depth map and body tracking features
Works on Windows and Linux but requires an NVIDIA GPU

Polymorphic: An Interactive Installation from Sensorium on Vimeo.

Orbbec Femto

Orbbec released the Femto Series to fill in the gap left by Microsoft discontinuing the Kinect for Azure. The Femto cameras can be used either with their own Orbbec SDK or with the Microsoft Azure Kinect SDK (meaning they can be a drop-in replacement in a project that uses Azure Kinects!). The Femto Mega can also be powered and stream data over Ethernet, which greatly simplifies cable runs over long distances.

The Vivid Unknown at BAM excerpt

Leap Motion

The Leap Motion Controller is a depth sensor that focuses on hand and finger tracking. It can be used for both Desktop and VR applications.

Technology: Stereo IR

Orbbec Astra

Orbbec also released the Astra Series as a response to the Microsoft Kinect. The goal was to create an open, cross-platform SDK which included body tracking with OpenNI.

Technology: Strucutured Light

USB Connections

We will often find ourselves wanting to connect many sensors to a single computer, or wanting to position our sensors far from the computer. This can be achieved using USB hubs and USB extenders, but one thing to remember is that not all USBs are created equal, and like most things in life, you get what you pay for.

In all cases, the most important thing we can do is test our setup with all the hardware connected to make sure everything is working as expected.

Bandwidth

USB bandwidth (amount of data over time) should be planned carefully:

Only enable feeds that are necessary to the application (e.g. Disable the RGB color stream if we are only interested in the depth data).
Use a lower image resolution if it provides enough information (e.g. Test a lower resolution stream to see if it provides enough precision).
If using multiple devices, connect them to different USB channels when possible (e.g. Connect one on the front and the other on the back).

Hubs

USB hubs can help, but have limitations:

Powered USB hubs will have higher bandwidth as they can use more power.
External hubs that connect to a PC using a USB connection will create a bottleneck (since all the data still needs to go through a single bus).
Internal PCIe hubs with dedicated channels per port will work the best.

Recommendation:

StarTech PCI Express Adapter Card

Cables

USB cables will deteriorate the signal over distance.

Use cables that are close in length to what is needed. Cables that are too long will weaken the signal.
Thicker and insulated cables tend to reduce interference.
Active (powered) cables can boost the data signal, especially over long distances.

Recommendations:

Depth Sensing

Technologies #

Coded (Structured) Light #

Time of Flight #

Stereo Vision #

Microsoft Kinect #

Kinect for Xbox 360 #

Kinect V2 (aka Kinect for Xbox One) #

Kinect for Azure #

Intel RealSense #

D415 / D435 / D455 / D457 #

Other Options #

Stereolabs ZED #

Orbbec Femto #

Leap Motion #

Orbbec Astra #

USB Connections #

Bandwidth #

Hubs #

Cables #

Technologies

Coded (Structured) Light

Time of Flight

Stereo Vision

Microsoft Kinect

Kinect for Xbox 360

Kinect V2 (aka Kinect for Xbox One)

Kinect for Azure

Intel RealSense

D415 / D435 / D455 / D457

Other Options

Stereolabs ZED

Orbbec Femto

Leap Motion

Orbbec Astra

USB Connections

Bandwidth

Hubs

Cables