Here two views of the same building. I took picture. I moved a little bit and then I took another picture. So we can see that the view point is quite different. We can see that the chimney stacked up here is different and we can see there’s far less foliage in one side of the picture.
So let's say that I'm interested in a point here, the corner of the balcony and in this image I can find its pixel coordinate. I want to find that same balcony in the other image. That's very easy for you and me to find that same point and I can click on it very very easily and I do that and I find here is its pixel coordinate. So two different views, the same point in the world, the same corner of the balcony when it's got to very different pixel coordinates because I moved the camera.
Being able to solve this correspondence problem is fundamental to many robotic vision algorithms. And we're not going to cover those algorithms in this course, but I think it's interesting just to outline the principles of point detectors and determining correspondence. So how might we go about solving this correspondence problem?
If I look at the grey scale value at this particular point on the corner of the balcony, I find that it's got a grey scale value of 51. That's interesting. So I perhaps that I go and look for all the pixels in the other image that have got the value of 51. One of them might be on the corner of the balcony or one of them might be, but there are a lot of other pixels within that second image that have got a value of 51. In fact there's just over 6,000 of them and they are indicated here as white dots against our black background. So clearly this is not a very useful or reliable technique.
Another difficulty is that although the pixel had a value of 51 when I took the first picture, but when I took the second picture, that pixel has got grey scale value of 47. So looking for a pixel with a value of 51 is forlorn because the pixel I'm interested in has actually got a different grey scale value even though it's the same point in the image. The reason for that is that the exposure setting on the camera is perhaps slightly different. The second image there is more blue sky, exposure of the camera is different and therefore all the pixel values have changed a little bit. So looking at the grey scale values alone is not enough.
Another point that we need to take into consideration is that it's not possible to always find a corresponding point. Let's now take a pixel that's in a sky. Now there is a whole lot of other pixels in the sky that look exactly the same as that sky pixel in the first image. So we can see that there is some pixels for which there is no corresponding point.
We can learn something from this example about trying to match a pixel in the sky from one frame to the other. But let's zoom in on this particular example. Each of the grey rectangles represents a single pixel. And we're going to consider a central 3x3 region of pixels. All these pixels have got exactly the same value. That's why they are shown with the identical shade of grey. I am going to chop out this central 3x3 region and I'm going to compare it to a whole bunch of neighbouring 3x3 regions and you can see that they are the same.
Wherever I put this moving red rectangles, it looks exactly the same as the original 3x3 window. So this central window that I chopped out is similar to eight other possible locations that I can put my 3x3 window. It's not unique and this is the problem with the sky. All the pixels are the same color. Wherever I put the 3x3 window it looks the same as the central 3x3 window.
Let's consider an example now along an edge. So we have some light grey pixels and some dark grey pixels. I'm going to do exactly the same thing. I'm going to chop out the middle region and I'm going to compare it then to a bunch of neighbouring regions and that matches there doesn't match there, doesn't match there, doesn't match there, matches well here and doesn't match here. So this central window matches at two other locations. So it is a bit more unique.
Consider now this pattern. A single dark pixel against a background of brighter pixels. I'm going to chop out the central region and once again I'm going to move the 3x3 window around. And we see that it doesn't match anywhere else. It is unique. It only matches to that central location. So we say that this particular 3x3 pattern of pixels is locally unique. It's distinct. It's means that it's going to be a pattern that we could perhaps recognize in a different image.
Let's try to formalize this. We're going to look at our central 3x3 window. We're going to look at an arbitrary 3x3 window which is displaced from that central window in the horizontal direction by delta U and in the vertical direction by delta V. So we can describe the similarity of these two windows. Capital I is our input image. The reference 3x3 window is centred at the coordinate U,V. And to do that we are going to index I and J over a region, so I and J. In this simple example are going to vary between minus 1 and plus 1.
So this is a similarity measure. It's going to tell us how similar the central window is to the shifted window. We can compute the similarity for an arbitrary valuable shift. Similarity's got the value of zero if the two regions are identical and has a large positive value if the regions are not similar. Now we'll introduce a measure, capital C which we call cornerness.
It's related to the uniqueness of a particular patch of pixels. And it is the minimum value of the similarity as we shift that 3x3 window around the reference window. If the reference window is similar to the window in all the shifted locations, then the minimum value of the similarity is going to be a very low value. We're going to have a low value of cornerness; a low value of uniqueness. So this is a point that's not very distinct. If however we find that when we shift the window, it's very very dissimilar to the reference window, then the similarity values are all going to be very high. The minimum of all these very high values is going to be a higher value. So we're going to have a high value of cornerness or a high value of uniqueness.
So this is the underlined principle of a very simple corner detector. A detector known as the Moravec corner detector and is really the ancestor of all sorts of modern corner detectors that have followed in it's footsteps.
Now we can generalize this approach. We got our similarity measure from before. But now what we're going to do is to introduce a Gaussian weighting matrix. It’s here W. And what that's going to do is to increase the weight with respect to windows that are close to the reference window and de-weight test windows that are further away from the reference window.
We're going worked through a little bit of maths here, we can rewrite this expression in terms of a vector of delta U and delta V, which is the window shift, multiplied by a square matrix A. This matrix A is referred to as the structure tensor, 2x2 matrix and it's comprised of the image gradiants. So we talk about how to compute image gradiants before. IU is the horizontal gradient. IV is the vertical gradient. And G is a Gaussian smoothing corner.
The advantage of this structure tensor is that it's based on image gradients. So any difference between images due to camera exposure change, the sun coming out from behind the cloud. Is eliminated by the intensity gradient operator.
The image tensor is the 2x2 matrix and so it's got two eigenvalues. The eigenvalues in this matrix tell us a lot about that particular part of the image. If both the eigenvalues are small, then it's saying that that particular part of the image is almost constant. If the two eigenvalues are both small, then that part of the image is almost constant. It's like the sky that we were looking at. All of the pixels in that neighbourhood have got a very similar value.
If either of the eigenvalues are large but not both then this corresponds to a point which is lined along an edge in the image. It doesn't tell us about the orientation of the edge, it's just says that this region of the image contains an edge. Now if both the eigenvalues are large, then this corresponds to a peak. It says that we have a local maxima or minima in pixel intensities with respect to the surrounds and so this is a very distinctive point that we're able to recognize in another image.
There were a couple of algorithms based on this principle which determine whether a point in the image is interesting and this is referred to often as corner detectives but that's more appropriately referred to as interest operators. A very famous one is the Shi-Tomasi detector and what it does is simply for every pixel compute the structure tensor and take the minimum of the two eigenvalues. If both eigenvalues are large, then the Shi-Tomasi detector has a large value.
Another very famous one is the Harris corner detector and it is similar in principle. It effectively determines pixels that have got two large eigenvalues but it does it by computing the determinant of the matrix and the trace of the matrix which are two things that are very very easy to compute and takes the scaled difference between those and returns the value that is high if that particular part of the image is interesting.
Here is an example of the output of the Harris corner operator. Pixels that have got a positive value are shown in blue. Pixels that have got a negative value are shown in red and pixels that are around zero are shown as white. The corresponding region of the input image is shown over here on the right.
We see that the Harris detector gives a strong positive value when there is a distinctive local region of the image. Something that we may be able to find in another image. It gives a strong negative response to an edge in the input image. So if we were looking for points to try and locate in another image, we would look for those regions where the Harris corner value had the strongest positive value.
Now that we found the coordinates of some interesting points in both the left image and the right image, we now need to determine the correspondence, which interesting point in the first image corresponds to which interesting point in the second image. And, we have talked before how the pixel values themselves are not sufficiently unique.
Note the notation that I’m using here, a leading digit, the 1 or the 2, indicates which particular frame we are talking about. So, this indicates the first image and this one indicates the second image. To get around the problem of the non-uniqueness of individual pixel values across images, we are going to look at W by W window of pixels centred around each corner point.
These are group of pixels with W squared values in it is likely to be much more unique than a single pixel centred at the interest point. What we’re going to do then is to compute this window of pixels around the interest points in the first image and around all the interest points in the second image.
We talked previously about image similarity measures and we are going to use them again here to compare these windows centred on the interest points. Let’s recap very briefly on the utility of these features, in particular the Harris corner feature that we have just been describing. It leads to a very concise description of an image. Instead of an image with millions of pixels in it, we now have hundreds of features. The feature is a distinct point within the scene. It’s something we can find from one frame to the next, just useful if that camera is moving. We only have hundreds of these features now instead of millions of pixels. So, the Harris algorithm tells us the coordinates of each of these distinct features within the image.
Now, we need to describe those features and we do that with a W by W window of pixels around each of those features and we use image similarity measures to compare them across frames. Another way of thinking about this is we have eliminated a lot of the irrelevant or not useful information from the image and concentrated just on the very high value information within the image.
This technique of finding corner features and matching them across frames is really useful for real time tracking as we just showed.
For a camera moving through the environment we frequently wish to track particular world points from one frame to the next. We’ll do a quick introduction to the very large field of feature detection and matching using Harris corner features.
This content assumes an understanding of high school level mathematics; for example, trigonometry, algebra, calculus, physics (optics) and experience with MATLAB command line and programming, for example workspace, variables, arrays, types, functions and classes.