LESSON
Principles of Stereo Vision
Share
Transcript
Let's talk some more about binocular disparity as I mentioned before is a technique that's very widespread through animals and it's also a technique that's used in robotics.
A good way to try and understand the principle of binocular disparity is to look at this very old-fashioned stereo photograph. It's a pair of photographs of the same scene but each image is being taken from a slightly different viewpoint and this corresponds to the viewpoint from our left eye and our right eye.
This is separated by that 8 to 10 centimeters and they give you a very slightly different view of the world. Initial inspection of this stereo photograph indicates that there's not much difference. If we zoom in on a couple of areas here, we see that in the left hand image, there's quite a gap and there's in fact no gap in the right hand image.
If we look on the other side of the picture, we look at this bench. It's quite close to the edge of the image in the left picture and there is a bit of a gap between the edge and the bench in the right hand image. It's this very subtle differences in the image taken from these 2 viewpoints that gives us this very vivid 3-dimensional perception of the world that's around us.
The fundamental principle of stereo disparity is that the image depends on the camera position and we can demonstrate that pretty easily. If I move my head, if I translate my head from the right through to the left then, everything that's in the world appears to move to the right but importantly, things that are close to me like this robot here, appear to move a lot in the image as I do that shift. Whereas things that are far away like the bookcase, appear to move much less.
The people have been interested in capturing stereo photographs for a very very long time. This is a lovely picture I found of a commercial stereo film camera from the 1950's. A more modern version of a stereo camera, this is a digital stereo camera of this type that's quite widely used for robotic systems.
Let's now try and understand what precisely are the differences between the left hand image and the right hand image of the same scene. This is a pair of high resolution digital images of a pile of rocks.
I'm going to draw a line from the left hand edge of the image to a fairly distinctive blotch on one of the rocks and I'm going to draw the same length line in the right hand image. The yellow line has gone too far to the right and in fact, we need to go back the distance in order to get to this distinctive mark on the rock.
Let's pick another distinctive feature at the back of the scene here. So, I'm going to draw a yellow line from the left hand edge to the edge of one of the large blocks at the back. I draw the same line in the right hand image and again, we see that it's overshot, the corresponding point and I need to go back the distance in order to get to that corresponding point.
We noticed 2 important things. First of all, we noticed that points in the right hand image are shifted somewhat to the left and this shift, this horizontal displacement is referred to as disparity. We also note that the shift is less for points that are further away from the camera.
So, here we have a stereo camera and it's observing an object, a ball and that is at distance Z away from the camera. We refer to the base line of the stereo camera, that's the distance between the centers of the lenses and F is the focal length of the lens. We assume that both lenses have the same focal length.
Now, this is a very simple relationship between these quantities and that is that the disparity is proportional to the focal length and base line divided by the distance. So, if we know the focal length, we know the base line and we know the disparity in the image then, we could estimate Z. This is the fundamental underlying principle of robotic stereo vision, sometimes it's called computational stereo and implicitly, it's what happens in our own visual system.
Code
One very powerful trick used by humans is binocular vision. The images from each eye are quite similar, but there is a small horizontal shift, a disparity, between them and that shift is a function of the object distance.
Skill level
High school mathematics
This content assumes an understanding of high school-level mathematics, e.g. trigonometry, algebra, calculus, physics (optics) and some knowledge/experience of programming (any language).