Scale invariant corner features (SIFT)


If I have two views of the same scene but there is a very large change in the viewpoint, a very large change in either the position or orientation of my camera, then the pattern of pixels that surrounds each of these interest points will be different. For instance, if the pattern of pixels around a corner feature in one image looks like this and in the case where my second view looks more obliquely at that particular part of the world, then that will introduce some perspective distortion to that particular part of the image and those two windows will not match particularly well.

Similarly, if my camera moves a long way away from the scene, then the image will change in scale and the simple image matching techniques that we used will also fail. And also, if I rotate the camera, again the image similarity measures will fail. So, what we need is some way to match the region around an interest point that is in variant to scale and to rotation.

Now, this is a huge area of research and there is some wonderful algorithms available today that provide this functionality. Perhaps, one of the best known is the SIFT Detector by David Lowe bended back in 2004. And, it provides output something like this. Here is the image of the building that we looked at before and overlaid on that are a number of what are referred to as SIFT features. Centre of each circle represents an interest point, a point that’s distinct enough to find in another image.

The size of the circle sitting on top of each of those interest points indicates something about the scale of the feature. So, all of the pixels that are in the circle around the feature are used to describe that feature. So, we can see that some features are very small. They contain a lot of dense texture spread over a small number of pixels. While, some other features are very large, they encompass the whole corner of a building. These features also encode something about the orientation of the feature in the image and that’s indicated by the radial line.

We will have a quick look at these scale invariant and feature transform using the same image we used when we were looking at the Harris corner feature.

And here it is. Now to compute the SIFT features I use the toolbox function isift, I parse in the image and very similar arguments as we used for the Harris corner function and I am going to select 200 corner features. Just take a minute or two to compute and now in the workspace we have a vector of 200 SIFT point features.

Let's have a look at one of these features. Let's have a look at the first one in the vector. It's called a coordinate of 289.9 and 805.4 it has a number of additional attributes like the Harris corner it's got a strength which indicates how distinctive that feature is but unlike the Harris corner feature, it also has an orientation, a theta value and also a scale value.

Let's have a look at this particular feature in the image. We do that by calling the plot method on it, we are going to plot it in yellow, I am going to exaggerate it's scale so it's a bit easier to see. I will exaggerate the scale by a factor of 16 and I am going to display it in clock format and I will explain that in just a moment.

There we see the feature. Drawn a circle around it and the circle indicates something about the scale of the feature and the line which is the hand of the clock it's what the clock option is about says something about the orientation of the particular feature.

Now we can display the SIFT features for the whole image. To do that, I am going to redisplay the original scene draw it a little darker and then I am going to plot all of the features in white and we exaggerate the scale a bit less this time and I am going to display them as clocks. And there we see it.

Each feature has a coordinate, a point within the image. For each feature we draw a circle around it and the size of the circle indicates something about the scale of the feature. A small circle indicates a pattern of pixels that's quite distinct across that very small spatial scale within the image.

So it might be a single leaf, it might be a small corner of a balcony or a window in the image. A large circle corresponds to a large scale feature.

So that's something that is distinctive at a much larger special scale so it might be the general idea of a bright building surrounded by some dark trees along the bottom edge. There is a lot of information in the position of the feature also in its scale.

Each circle has got a single radial line and that corresponds to the orientation of the feature that is saying something about how that pattern of pixels is oriented within the image.

Imagine now I move my camera somewhat dramatically.

Perhaps I rotated it by 90 degrees or perhaps I moved much further away from the building. The great advantage of the SIFT feature is that it will allow me to match features between these two very different views.

Although the orientation of the pattern of pixels in the scene will be different although the scale of the pattern of pixels will be different, the SIFT algorithm is able to see through that. And you will find the same interesting point in each of the two very, very different views. But we will report that the orientation has changed or that the scale of the feature has changed.

In addition to having a position and an orientation and a scale the SIFT feature also has a descriptor and we can have a look at the descriptor for feature number one.

And we can see here in the workspace the new variable D and it is a 128 element vector. The SIFT algorithm describes the pattern of pixels within the circle, what we call the support region by 128 element vector. So that's quite a comprehensive description of that particular pattern of pixels. If we have two these very, very diverse views of the same scene even though the orientation and the scale might be very, very different.

The descriptors associated with the interest points in the two scenes will be very very similar and we can evaluate the similarity of points between these two scenes simply by comparing their two descriptive vectors and typically that's done in the Euclidian sense by the square root of the sum of the squares of the difference between the two descriptive vectors.

Now this has been a very cursory introduction to the idea of SIFT features. The very very important in computer vision unfortunately in this introductory course, it's not possible to say very much more about this. There is a lot of literature and you will find mention of them in many computer vision text books.

When matching points between scenes with large different viewpoints we need to account for varying image size and rotation. SIFT features are a powerful way to achieve this.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes an understanding of high school level mathematics; for example, trigonometry, algebra, calculus, physics (optics) and experience with MATLAB command line and programming, for example workspace, variables, arrays, types, functions and classes.

More information...

Rate this lesson


Leave a comment