In the last couple of lectures, we've looked at the process of image formation, that's how we transform the 3D world, into a two-dimensional image or representation of that world. The actual three-dimensional structure of the world is really, really important, for animals as well as for robots. So in the case of animals, if you imagine that you are a predatory animal, it's really important to know how far away the prey is. If you're a prey animal it's really important to know how far away the predator is. If you're a robot, you might want to know how far away is the object that you want to reach and grab. If you're a mobile robot, you want to know which parts of the space in front of you have got objects sticking up above the ground, that prevent you from driving there and which parts of the world are free from those kinds of obstacles.
So we want to have a three dimensional structure of the world, but we've only got a two dimensional representational projection. How do we solve this problem? When human beings look at a two-dimensional image or a photograph, we imagine the three-dimensional structure and that comes from our experience of 3D worlds, the experiences that we've gain through our whole lives. And I'm going to talk a little bit about some of the mechanism by which human beings fill in this missing dimension. In particular, I'm going to talk about binocular stereo and that's how we take information from two cameras, like the two eyes that we have in our heads and recreate three-dimensional world from that. And this is a very common technique in robotics called "Stereo Vision".
An image is a two dimensional projection of a three dimensional world. The big problem with this projection is that big distant objects appear the same size as small close objects. For people, and robots, it’s important to distinguish these different situations. Let’s look at how humans and robots can determine the scale of objects and estimate the 3D structure of the world based on 2D images.