The Geometry of Perspective


Let’s talk about the geometry of how images are formed. All images start with light coming from some source of illumination; perhaps the sun, perhaps indoor lights in the ceiling of a building and that light reflects off objects in the world. Some of that reflected light ends up in our eyes. There are two types of reflection. There is the mirror-like reflection we call specular reflection and there is the reflection from a diffuse or matte surface, which is called Lambertian reflection.

Most surfaces are a mixture of specular reflection and Lambertian reflection. But irrespective of the sort of reflection that’s occurring, we have light from a light source reflecting off objects and some of those light rays are going to enter our eyes. Let’s look at this in a graphical representation.
Once again, we have our observer of the scene. We have a three-dimensional object in the world and we’re going to consider what happens at just a few points on that particular object. So the first thing we need is a source of light.

We turn on the sun and what’s going to happen now is light is going to be reflected from the sun at a number of these points and some of those light rays are going to enter the eyes of the observer.

When we look at a three-dimensional statue like this either through a camera or with our own eyes, we experience a very vivid and crisp image of that object. But if I simply hold up a piece of paper here and it is receiving light that’s reflected off the statue, then no matter how hard or how closely I look at this paper, there is no concept of image being formed on here. We need to organize the light rays, which are leaving the statue in order to get a crisp image.
So let’s look at this problem graphically. We have an image plane, the piece of paper that I just held up and if I consider any particular point on that image plane, the light that’s falling on any single point could have come from any number of points on the three-dimensional object. What’s happened is the light from all of those points is being mixed up so that if I just hold up a piece of paper no coherent image will be formed on that.

What we need to do is to order the light rays in some way and the simplest way to think about doing this is to put an opaque plane in front of the image plane and to drill a small hole in it. This is a configuration that’s referred to as a pinhole camera.

What this does is order the rays that are leaving the points on the object and it causes an inverted (that is an upside-down image) to be formed on the image plane. This approach to imaging is called a pinhole camera. Sometimes it’s called a camera obscura and it’s been known since ancient times.
This picture shows a building with a small hole drilled in its wall and what’s happening is that there is an image of the bright sun outside is being cast on the interior wall of the building.

Here are some examples of pinhole camera images that I’ve downloaded from the web. People have observed these images perhaps they’ve been inside a darkened room, there’s been a bright scene outside, a bright sunny day and there’s been perhaps a hole on the blind or window covering and that has caused an image of the outside world to be cast on the wall.

Now recall from the previous example that this image is inverted. So in the left-hand image, the person like turned their camera upside-down and you can see an electric toothbrush hanging from the top and in the right-hand image, you can see the fact that the pinhole image is in fact inverted. Here is an example of a very, very large pinhole camera. This was an amazing project that happened in a disused aircraft hangar back in 2008. On the left, we can see a bunch of people standing in front of the image plane. That’s where the image from the pinhole camera has been cast and on the right-hand side, you can see them endeavoring to actually capture that image so they made a very, very large piece of film, a very large negative and they’re going to put it up against the wall and expose it and capture an image using this pinhole camera.
So, a quick refresher on what happens with a pinhole camera. Rays of light leave various points on the object. They all pass through the pinhole and cast an upside-down image on what we call the image plane.

The geometry of this pinhole camera is actually very simple to describe. We’ll consider that the object is at a distance Z away from the plane of the pinhole and the image plane is a distance F (F for focal distance) away from the plane of the pinhole.

Then if the height of the object is Y, the height of the image is y and these are two similar triangles. So it’s very easy to write the relationship over here. It simply comes from the fact that we’ve got two similar triangles. We can also do the same thing for the horizontal plane. We introduce the symbol X for the distance out of the page of the object that we’re looking at and x for the distance along the wall in the image plane.

We can again write the equations that come from looking at two similar triangles. So this image formation process maps a point in the world with coordinates X, Y and Z to a point on the image plane whose coordinates are x and y. We can rearrange those equations in this fashion.

So if x now in terms of the real world coordinates X and Z and similarly for y. So this is what we call a projection. It projects a three-dimensional quantity X, Y, Z into a two-dimensional quantity, x and y. It is a mapping between three dimensions and two dimensions.

This referred to as perspective projection. This is the mathematical basis for the process that we’d call perspective projection.
A consequence of perspective projection is that there is no unique inverse. If I have an image of the object in the real world then there are an infinite number of possible objects that could cause that image. It could be a small object that’s close to me or it could be a large object that’s further away from me.

One of the dimensions has been fundamentally lost. From a two-dimensional image, we cannot recover this third dimension. Now, in our brains, we use a lot of tricks to try and recover that third dimension.

We know something about the structure of the world. We know something about the size of objects. So if we see what looks like a small person we think “No, no, no, it’s probably a large person who is further away.”

So to recover the third dimension is fundamentally impossible but there is other information that we can bring to bear to recover it.


There is no code in this lesson.

Let’s look at how light rays reflected from an object can form an image. We use the simple geometry of a pinhole camera to describe how points in a three-dimensional scene are projected on to a two-dimensional image plane.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes an understanding of high school-level mathematics, e.g. trigonometry, algebra, calculus, physics (optics) and some knowledge/experience of programming (any language).

More information...

Rate this lesson


Check your understanding

Leave a comment