Stereo is not the only technique that a robot can use to determine the 3D structure of its world. This is a very familiar device; it's the Microsoft Kinect Sensor. This is sometimes called an RGBD camera that is it's a sensor that returns you at each pixel, the red value, the green value, the blue value and the depth. So, we talked earlier about color planes in an image. The Kinect Sensor returns an image with 4 planes, R, G, B and D. The Kinect contains 2 quite separate subsystems.
The first is a fairly conventional color or RGB camera. The second component is a laser pattern projector. So, this is actually omitting, a pattern of infrared light onto the scene and then, there is an infrared camera which is observing that pattern and from the observation of the pattern, it can determine 3-dimensional structure.
What does this pattern looks like? It's looks pretty speckly, It’s pretty close to a random pattern of dots.
What the Kinect Sensor does is exploit a very classical principle called structured light. What I want to do here is to illustrate the principles of a classical structured light sensor.
The gray box here represents the Kinect Sensor and we have an object at some distance away from the sensor. We have a camera which is represented here by a pin hole camera model with an image plane behind it and we have a laser projector which emits a fan of rays.
Those rays intersect the object and forms spots of light on the object. The camera observes those spots of light and forms an image which is a projection of those spots on the object.
Let's just make a temporary copy of where those spots are on the image plane. Now, we're going to consider that the object has moved further away from the camera and we repeat the process. The laser projector emits the same fan of laser beams. Spots are formed on the object. Pinhole camera observes that and we see a projection on the image plane. If we add back the previous image plane projection which we say before, we can see that the pattern of dots has moved. There is a vertical displacement or a disparity that is proportional to the object distance.
So, this is the fundamental principle of structured light. Now, we can add some additional measurements here. We have a base line which is the distance between the centre of laser projector and the centre of the camera.
The camera's got a focal length and the object is at a distance z. Now, we can write a relationship between all of these quantities and the disparity is inversely proportional to the object distance z.
So, this is not exactly how the Kinect works but it's a very similar principle. On the Kinect Sensor itself, the base line then is the distance between the centre of the laser pattern projector and the centre of the infrared camera. Another principle is what's called the time of flight camera and what we can see here is a camera lens in the middle and on the either side of that, there are 2 arrays of quite high power light-emitting diodes. These diodes don't project a speckle pattern onto the scene. They provide general overall illumination of the scene.
The principle works something like this. The light that's being emitted is modulated at some quite high frequency and is reflected off the object and picked up by the camera.
So, the modulation frequency of light might be something like 10 or 20 megaHertz. If we plot the intensity of a light against time, the outgoing light in red and incoming light in blue, we see that the incoming light is slightly delayed with respect to the outgoing light.
There is a phase shift. So, there's a delay of T. The distance is proportional to the phase difference between the outgoing and the incoming light. The relationship between distance and the time delay T is given here where C is the speed of light. Speed of light is a very big number that means that the time shift is very small.
Typically, it's of the order of nano seconds. This process is replicated at every single pixel in the scene. So, we need a very special sort of imaging sensor. One that's not just responsive to the intensity of the incoming light but a pixel that is responsive to phase shift.
So, these cameras require very special sensing chips and typically, don't have particularly high resolution, maybe 64 by 64 pixels or 128 by 128. A generic disadvantage of this kind of sensor and also the Kinect Sensor is that they rely on projecting light onto the scene. This means that they don't work particularly well outdoors where you've got a very bright sun completely overwhelms the light that the sensor is trying to project onto the scene.
They do, however, work very very well indoors and they work very very well at night where a traditional camera will return nothing.
Many technologies have been developed to determine the 3D-structure of the world. RGBD sensors such as the Kinect use structured light, projecting a pattern of light onto the scene and observing how it is distorted. Time of flight sensors measure the time it takes for a pulse of light to travel from the camera to the object and back again.