Image warping


Now we’re going to look at a very general technique called image warping.

Image warping can be used to perform a large number of operations on an image and we’re going to consider some very simple examples first. But they can all be combined together to allow for quite complex image transformations.

We did touch on image warping in the last lecture when we were discussing homographies and perspective rectification. Let’s consider first the case of image scaling.

Here we have an input image of the Mona Lisa and I want to shrink her by a factor of perhaps a half in each direction. So if the output image is the same size as the input image then the shrunken Mona Lisa sits up in the top left-hand corner. You’ll note here that there are a large number of pixels in the output image which are coloured red. That’s because for these pixels we don’t have any idea what their value should be. We’ve scaled the Mona Lisa to a quarter of her previous size but we’ve placed her in an output image the same size as the input so there are a whole lot of pixels whose value we are unable to estimate.

This is a topic we’ll come back to at the end of this section.
Now I want to combine scaling and shifting. So we take our original input image and we’re going to shrink her. We’re going to put her into the middle of the output image so she’s been reduced in size and then offset so that the top left corner of the Mona Lisa picture is not in the top left corner of the output image.

I can consider rotation. In order to rotate the image, I first need to decide the point about which I will rotate. So we’ll rotate about a point here just underneath her chin and we will rotate by something like 45 degrees.

When we place this rotated image into a frame which is the same size as the input image, we see a couple of interesting things. We see firstly that there are some corners of the output image which is still colored red.

That’s where there are no valid pixels. We also see that the corners of the Mona Lisa which stick outside the image are going to have to be chopped off. So by rotating the Mona Lisa, we lose some corners of the original image and there are some corners of the output image which are unset.
So then we can combine shifting, scaling and rotation all together into one overall transformation, the input image to the output image. In order to formalize this, we’re going to define the coordinate of a pixel in the input image using the notation U and V as we’ve done all through this course. We’re going to define the coordinate of the corresponding point in the output image and I’m going to denote those by U-prime and V-prime. So we can write a very general function, U-prime, as a function of the two input coordinates and V-prime as a function of the two input coordinates.

So in these two general functions fU and fV, we can describe scaling, shifting, rotating, various combinations of those and many other kinds of very interesting distortions. You can do funhouse mirror effects and so on.
Let’s look at a concrete example of scaling and shifting, how we actually go about doing that. We’re going to define a pixel in the input image, coordinates U, V and it’s right in the middle of the Mona Lisa’s eye. We’re going to define the corresponding point in the output image and its coordinates I U-prime and V-prime. This output image is one quarter the size of the input image and it’s offset 100 horizontally and 200 vertically. So we can write a very simple relationship between U-prime and U and between V-prime and V.

Now what we’re going to do is to consider every pixel in the output image, every possible value of U-prime and V-prime. For every value of U-prime and V-prime that we look at in the output image, we can compute what will be the value of U and V in the input image. So we look at every pixel in the output image and we work out where it comes from in the input image. So we have to in fact invert those general equations that we wrote earlier, the fU and the fV. Here we have inverted them so that we’ve got U and V now in terms of the output pixel coordinates, U-prime and V-prime.
Let’s use an example to make this very tangible. Let’s choose for example the output pixel at coordinate 303 and 269. Given that, it’s very easy to work out the coordinate of the pixel in the input image.

Interestingly, the pixel in the input image is fractional. We’re looking for the pixel at 50.75 and 17.25 and this is a bit problematic because pixels have got integer coordinates; U and V that we’ve looked at so far have always been integer values. So how do we go about finding this fractional pixel value in the input image?

Let’s consider that we’ve got a little patch of the input image shown here in a very zoomed-in fashion. So the particular pixel that we’re looking for would actually lay somewhere here in our grid of pixel values. There are a few strategies that we can use to determine the imaginary value of the pixel at this particular coordinate shown here by the yellow dot.
The first thing we can do is to take the value of the pixel that’s closest, so the yellow dot lies within the bounds of the pixel shown here that’s got a value of 115. So we can just say the pixel value is equal to 115. But it’s in the bottom corner of that particular pixel so you’d think that the 117 next door, the 123 beneath it might actually elevate the value there. A more sophisticated approach then is to take a weighted average of the neighboring pixel and the weights are proportional to the distance between the pixel center and the yellow dot marked here. This is the technique that’s referred to as bilinear interpolation. It can also be thought of as fitting a plane to these four points. The pixel values represent the height of the surface. We can fit a plane to those four points and then we can determine the height of that plane at the U, V coordinate of the yellow dot.
Let’s consider now what happens if I choose a pixel down here. So this output pixel, this particular value of U-prime and V-prime, lays outside the Mona Lisa image in the output image. What’s going on here? If we take this particular output pixel coordinate, 500, 300 and we map it to the coordinate in the input image, we find that it lays way outside the input image. So we can use some pretty simple logic to say that an input image coordinate like 1600, 400 lays outside the bounds of the input image.

Therefore, we don’t have a value to put into the output image at that particular point. If this was a double-precision output image, we might use the very special double-precision value known as not a number or NaN to represent pixels whose value we are unable to determine.

If it was an integer output image, we might reserve a particular integer value, perhaps zero, perhaps 255, to represent the pixels whose value we are unable to determine.

Image warping allows us to shrink (or expand) an image by any scale factor, as well as to translate and rotate it. Let’s look at how image warping works.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes an understanding of high school level mathematics; for example, trigonometry, algebra, calculus, physics (optics) and experience with MATLAB command line and programming, for example workspace, variables, arrays, types, functions and classes.

More information...

Rate this lesson


Check your understanding

Leave a comment