Time-of-Flight Camera

Using the Speed of Light to Build 3D Images

November 2017

When you think of cameras, you probably think of devices that capture two-dimensional images. A lens collects light from a cone-shaped region and projects it on an image sensor, resulting in a rectangular image. An image is then a record of the light intensity captured at a point in time from a two-dimensional field.

Well, actually that field has three-dimensions. The conventional camera is only capturing a two-dimensional projection of 3D image space. Wouldn’t it be useful to capture depth information as well – to record how far away each point in the image is from the camera? This would enable inspections and measurements not possible with conventional 2D cameras.

Scientists and engineers have been working on this problem for decades. The most common 3D solution in machine vision is currently laser triangulation. A laser line is projected on an object from a point off the camera’s axis. As imaged by a 2D camera, the line projected on an object will shift as the distance between camera and object changes. So long as the angle between camera axis and laser axis remains constant, the system can be calibrated to output precise data in world coordinates.

Although successful in many installations, there are some disadvantages to laser triangulation. The system only captures depth data along the laser line. To capture the complete depth profile of an object, the object must move relative to the imaging system so that it may be "scanned" over a period of time. Furthermore, there are often issues with occlusion, where the shape of the object prevents the camera from imaging the laser line in some regions. For example, the laser line cannot be projected to the bottom of a well such that it is still visible to the camera. For this reason, laser triangulation works best on relatively flat objects that do not have rapid changes in their depth profile.

There are variations on laser triangulation. For example, a grid, an array of dots, or even random but known patterns, can be projected instead of a line. Although eliminating the need for a scanning motion, these approaches can still have problems with occlusion.

Another approach to building 3D images is stereo vision. Modeled on human depth perception, this technology is still relatively uncommon in machine vision. In stereo vision, two or even three 2D cameras are used to image the same area. Features within the images are located. These features, as viewed by different cameras, then need to be matched. For example, an object corner viewed by one camera must be matched with the same physical corner viewed by a second camera. The relative change in feature locations, as viewed from each camera’s unique perspective, is used to calculate distance. Careful calibration can yield precise results in world coordinates.

The disadvantages of stereo vision are related to the extraction and matching of features. Homogenous surfaces with few features will yield little depth information. On the other hand, objects with too many features can result in confusion when matching the features captured by different cameras, leading to spurious results. Think sandpaper. Feature extraction and matching requires significant computational horsepower.

Is there another way to build 3D images? Yes there is! Imagine pulsing a light for a very short duration, then measuring how long it takes the light to bounce off an object and return to a camera. The time it takes light to travel this path is of course proportionate to the distance. This technology is called "time-of-flight," or "ToF." Once calibrated, such systems easily output data in world coordinates.

ToF systems have a few key advantages. Unlike laser triangulation, depth maps are captured by a single image with no need for motion. Unlike stereo vision, the necessary image processing is not computationally intensive. Unlike both alternatives, ToF does not depend on physical separation between multiple components. ToF systems can therefore be compact and cost-effective.

ToF cameras can also capture both 2D and 3D images simultaneously. So, for example, one capture event could enable reading a barcode printed on a carton, as well as determining that carton’s dimensions and position.

The main disadvantage of ToF cameras is their accuracy. Because light travels so fast, they are limited to +/- 1 cm accuracy. Also, highly reflective surfaces may cause light to bounce a couple times before reflecting back to the camera, forcing the camera to discard associated data points.

i4 Solutions will have an operating Basler ToF camera at the 2017 MinnPack show. Stop by booth 715 to see it.