Human 3D Perception


We might take some inspiration for how human beings solve this problem; we're very good at understanding and operating in the three dimensional world. In fact human beings use a large number of different techniques, tricks, sometimes called visual cues, to understand three-dimensional structure.

Nine of them have been identified in this particular paper.  Different cues operate in different distance regimes.  Thought there are some that are effective within our arms reach, some are effective up to tens of meters and some are effective right out to the horizon, a distance of many kilometers. 

Let's have a look at some of these visual cues.  The simplest of all is occlusion, and this information comes from what object is in front of another object.  We can tell here that the cat is in front of the brick gatepost because the cat blocks out of obscures some of that gatepost and the flowers are in front of the cat because they block out some of the cat.  

Another important cue is height in the visual field.  Looking at this we can see a tree and we know a tree's a pretty big thing.  But in this particular image the tree does not look that big, so therefore we figure that the tree must be a long way away. 

We also have relative size.  Here we have a number of trees. In our brain there's probably some sort of logic that says that these are trees, trees are all going to be the same size, so therefore, this tree which looks bigger than the other tree must be closest to us. 

In these eye catching pictures, our brain is momentarily confused.  It takes a moment for us to figure out what are the small things that are close to the camera and what are the large things that are far away from the camera.  

Here's a picture of a large stone in the desert and just looking at it, it's rather hard for us to gauge in any sort of absolute way, how big this stone is and we can break the illusion by introducing something whose size we know; a pair of fingers and we can pick this stone up and we can compare it to its more famous big brother, the largest stone in the world, called Uluru in the center of Australia. 

Here we see a very, very powerful illusion.  But nothing about this room is, as it seems.  For a start the floor is not flat.  Now follow me in and see what it looks like from the inside.  The clock is not actually round; if we look at those square floor tiles we see that they are not actually square.  And the corners of the room are not right angles.  The whole geometry of the room has been contrived, so that from this one particular viewpoint it looks like an ordinary room.  

Another technique we use to estimate distance is called "Texture Density" and a good example of it is seen here in this gravel path.  Up close we can see the individual pieces of gravel and at increasing distance the pieces of gravel get smaller and smaller in the image.  Up close the texture is quite coarse and as we go further away the texture becomes finer and finer, so in our eyes and our brain, we're looking at the size of a texture and making some assumptions about how that relates to distance. 

A technique that we use over very, very large distances is called aerial perspective.  This exploits the fact that objects that are at large distances of the order of kilometers, become less distinct; more hazy. 

A really important cue is binocular disparity and we're going to talk much more about it in this lecture.  This is the information that’s derived from two views of the scene from two different viewpoints; our left eye and our right eye and it's a technique that is very, very widely used in animals. 

Our eye contains a lens and there’s a fairly autonomous part of our brain is responsible for keeping that lens focused, so if we're looking at something that's up close, this part of our brain is going to adjust the shape of the lens so that we can focus on that object.  If the object is further away, then the shape of the lens is going to be changed to focus on that. 

We can take information from this muscle and use that as a proxy for the distance of the object.  There is another fairly autonomous part of our visual system, which is responsible for controlling direction in which our eyes point.  When an object is close to our face, our eyes converge on that particular object.  If we take information about the direction that are eyes are pointing, we can triangulate to the object and that is a way of inferring its distance. 

The final technique that we use is called "Motion Perspective" and that is as I'm moving through the world, objects that are close to me, appear to be moving quite quickly, whereas objects that are far away, appear to move much more slowly.


There is no code in this lesson.

In order to determine the size and distance of objects in the scene our brain uses a number of highly evolved tricks. Let’s look at some of these.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes only general knowledge.

More information...

Rate this lesson


Check your understanding

Leave a comment