Planar Homography


Now we will consider the case where a camera is looking at a bunch of points and these points all lie on a plane.  The plane has got its own coordinate system which we denote by coordinate frame zero.  Clearly every point that lies on this plane has got a Z coordinate of zero, which is shown down here.  The coordinate capital Z multiples all of the elements in the third column of our camera matrix but because it's zero we can effectively remove that column from the matrix and we can remove that row from the world coordinate vector.  What we're left with now is a three by three matrix and we'll refer to this three by three matrix as a "Planar Homography".

Just as for the camera matrix, there is an arbitrary scale factor and once again we can normalize the homography matrix by choosing one particular element that we're going to set to the value of one.  So this three by three matrix, it's got one element that we've set to one, there are eight unique numbers remaining in the homography matrix.  And we can estimate the homography matrix if we have four world points and the corresponding position of those points on the image plane of our camera. 

So the concept of corresponding points, image that I've got two planes, one is perhaps the image plane of the camera, the other might be a physical plane in the world that the camera is looking at.  Alternatively the first could be a view of a plane in the world and the second image could be another view of the same plane in the world, where we've moved the camera between the two views. 

Now we've got four points in each of these planes, which I'm going to denote by the subscripts one through four and I'm going to arrange the coordinates of those points into the columns of a matrix.  But what's really important here is the ordering of these columns.  We have to ensure what's called correspondence.  P1 and Q1 must correspond to the same point in the world and so it goes for P2, P3 and P4.  Each point P and the corresponding point Q must refer to the same point in the world.

Let's look at a practical example of how we can use this technique to perform something called "Perspective Rectification".  Now this is a picture that I took of the Notre Dame Cathedral in Paris. 

It's a very tall cathedral, so I'm on the ground in front, looking up and taking a picture.  And clearly because my camera is tilted upwards I've got a very distorted view of the front of the cathedral.  But I know some things about cathedrals and particularly I know that the front of the cathedral is most likely to be a plane. 

So if I pick four points on the front of the cathedral that I believe all lie in a single plane and I can label them P1 through to P4.  But I know that those points in a non-distorted image will form a rectangle in the image plane, not a trapezoid.  I can compute the image plane coordinates Q1, Q2, Q3 and Q4 in order to have a rectangle in the image.  So if I have now two sets of corresponding points; I have the points P1 through P4 and I have the points Q1 through Q4, then I can compute a homography. 

So if I build up a matrix P that contains as columns the points P1 through P4 and the matrix Q, whose columns are the points Q1 through Q4, then I can compute a homography.  And it's shown here and very simple to do in MATLAB.  Now that I have this homography matrix H, I can use it to transform any point, P, in my original image, to any point, Q in a second image.  

And this is what the second image looks like.  We see that the cathedral has been straightened up.  We can see that the vertical edges of the cathedral are in fact vertical lines in the image.  It's important to remember that there's a very strong assumption made in this process and that is that all of the points in the image lie on a plane. 

Certainly many of the points in this image lay on the frontal plane of the cathedral, but not all do.  If we look at points around here, which are on the edges of the bell towers, then they do not lay on the frontal plane and the transformation won't be correct for them.  It will introduce a distortion in that part of the image.  You can't get anything for free, we've certainly proved that geometric correctness of the bulk of the cathedral.  Given that I've computed the matrix H using MATLAB, then it's a very simple matter to apply the homography to every single point in the image.  And we perform that by a process known as "Image Warping". 

To do image warping, we can see that every single pixel in the output image and the output image in this case is the geometrically correct, the rectified image of the cathedral.  

To illustrate this I’m going to choose just one particular point in the output image and it’s the pixel at coordinate 600,100.  Now if I know that pixel coordinate, I want to try and work out what's the corresponding pixel coordinate in the input image. 

The homography is a mapping from the original image to the new image, so in order to map this coordinate I need to use the inverse of the homography and that gives me the coordinate of the corresponding point in the input image and it's got a coordinate of 757 and 51.  The way image warping works then is we go and find the pixel at coordinate 757, 51 and we take that pixel value and we insert it into the new image at coordinate 600, 100.  So for every single pixel in the output image, we work out where it comes from in the input image. 

You can see here that the coordinates in the input image are fractional and that requires a technique called image interpolation to find what is the actual pixel value at this particular fractional coordinate. In a nutshell, that's the process of image warping.

Another application of image warping is this often used effect now in swimming telecasts, where we take the flag and the name of the competitors and we overlay them on the lanes of the swimming pool.  It's actually quite an easy trick to do and it involves these homographies. 

Now image that I could swim well, well enough to get into a swimming tournament, so there's my flag and there's my name.  Now I've got this image that I created, just using ordinary computer graphics, that's the easy bit.  Now I want to lay that image into my lane in the swimming pool.  All I need to do that, is to find the four corresponding points, so the four corners of this rectangle that holds the image that I want to overlay and the four points in the swimming pool where I'd like it to be laid.

Once I have that information I can warp that original image into this very distorted image which I could then insert into or overlay onto the original image of the swimming pool. 

Those of you who are doing the project associated with this course, the homography is going to be very, very useful.  You've probably already built a two dimensional robot, that sits on a worksheet and can move its end effector to any particular XY coordinate on the robot worksheet. 

Now image that we take a picture of that robot worksheet.  I have an image of the robot worksheet.  The homography lets me create a mapping between a coordinate in the image of the worksheet, which has got a coordinate of U,V in the image plane and I can map that to a physical coordinate, X,Y on the robot's worksheet.  I can map from an image plane coordinate to a robot worksheet coordinate, or I can map from a robot worksheet coordinate back to a camera image coordinate.

Now homography's are going to be very, very helpful for you in completing the project. 

Just to summaries the capability in the toolbox for computing and using homographies.  Given two sets of corresponding points P and Q, we can compute a three by three homography matrix.  The columns of P and Q represent points.  Now P might be image coordinates of known points in an image, Q might be coordinates of points on the robot's physical worksheet. 

Alternatively P could be a set of image coordinates in one image and Q could be a set of image coordinates in another image.  Given that I have the three by three homography matrix H, I can then map a set of points P, in the first plane, to a set of points Q, in the second plane.


There is no code in this lesson.

We can derive a linear relationship between the coordinates of points on an arbitrary plane in the scene and the coordinate of that point in the image. This is the planar homography and it has a number of everyday uses which might surprise you.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes an understanding of high school level mathematics; for example, trigonometry, algebra, calculus, physics (optics) and experience with MATLAB command line and programming, for example workspace, variables, arrays, types, functions and classes.

More information...

Rate this lesson


Check your understanding

Leave a comment