MASTERCLASS

Feature extraction

Lessons

Transcript

Let’s look at this simple example.

When you and I look at that we can see an image of the shark. But as far as a computer vision system is concerned, it is just a matrix full of numbers. In fact this 500 by 500 pixel image contains a quarter of a million numbers. But human beings looking at this see a distinct object. So the challenge is how do we find the object in this scene.

All we want to be able to do is find a region within this scene. We’re going to define a region as a group of pixels that are connected to each other (that is, they are next to each other) that have all got the same colour. So if we could do that, if for all of the white pixels in this scene we could somehow clump them together, then we would be able to find this coherent region; this object. And then we could begin the process of describing what it looks like and where it is within the image.

Let’s put this into practice. We will load an image; the shark image. Put it into the workspace im, and we will display it. These are all very familiar commands to us by now. And we’ll have a look at some pixel values in here. The black ones have got a value of 0, and the white ones have got a value of 1. We call this a binary image or a logical image. These are black pixels with a value of nought; could be considered by MATLAB as false. These could be considered by MATLAB as true. This image has got only two possible pixel values.

Now the first thing that we’re going to do is to make a list of all of the pixels within the image, and we’re going to do that using MATLAB’s built-in ‘find’ function. And what we’re going to do is find all of the pixels who’ve got a value greater than 0; that will be the white ones. For every pixel that it finds, every element of the matrix that it finds, it will return the row coordinate and the column coordinate into those two vectors v and u.

So we see that those vectors have got 7827 rows in them, so there are 7827 pixels in this image that are greater than 0. So let’s have a look at what these coordinates are. I’m going to display them and a very large number of numbers rolls past. So let’s have a look at some of these numbers. What it’s saying is, for instance, a pixel at coordinate 243, 170 is a white pixel that belongs to the shark. The pixel at coordinate 244, 169 is a white pixel that belongs to the shark. We have a long list of the coordinates of the white pixels. What can we do with these?
Well one thing we can do is find what is the smallest value of the u coordinate, that’s the smallest u coordinate in that set of white pixels, and this will be the largest u coordinate; it’s 245. So this is saying something about the smallest u coordinate of that set of white pixels, and the largest. We can do the same in the vertical direction; say the smallest vertical coordinate and the largest vertical coordinate. So these four numbers essentially bound that shark-shaped object.

So let’s plot a box on the scene, and it’s a box whose top left corner is u min and v min, and its bottom right corner is u max and v max, and we’ll plot it in the colour green. What we’ve done now is to place what we call a bounding box around that set of white pixels.

So we’re able now to say something about this set of pixels. We’re able to say something about where these pixels are within the image. Another thing that we can do is to say where is the centre of this group of pixels. And a simple way of doing this is to say the centre we could think of as being the average of the minimum and maximum coordinates. So in the horizontal direction it’s that number. In the vertical direction it will be this number.

Now what we can do is plot a point at that particular coordinate using the plot_point function: pass in the u coordinate, the v coordinate … and it expects the column vector, and we’ll put an asterisk at that location. And now we have drawn a point, which in some ways is the centre of that object. Actually, it’s the centre of the box and is roughly the centre of that set of white pixels. We’ll be able to do better than that shortly.

A technique that’s going to be really helpful for us in this endeavour is to compute what are called moments of the image. The moments are given by this equation, and there are two parameters to the moment, p and q. So we refer to the ‘p-qth moment’ of an image. So the moment is simply the sum of all of the pixels in the image, of the u coordinate of that pixel to the power of p, the v coordinate to the power of q, multiplied by the value of the pixel at that particular coordinate. At the moment we’re dealing with binary images, so the pixel value is either 0 or 1.

A particularly interesting moment is when p equals q equals 0. We call this the zeroth moment of the image, and because p and q are zero, we simply remove them from the summation. The moment 0,0 is simply the sum of the pixel values. Now if pixels are either 0 for background or 1 for the object, then the sum pixel values is going to be the area of the region. It’s going to be the number of bright pixels within the shark that we were just looking at.

Another set of very useful moments are the first moments. And these are the moments 1,0 and the moment 0,1. And their equations are shown here. We can think of these as weighted averages of the u and v coordinates, weighted by the colour of the pixels. So if we take the ratio of the moment 1,0 by the moment 0,0—that’s the first moment divided by the area—this gives us the coordinates of the centroid of the object, and this is really useful. Now we’ve found the geometric centre of this shark-shaped object. We’re beginning to be able to describe it.

Let’s continue the session that we had earlier. First, we’ll computer the zeroth moment of the scene using the mpq function. We’ll pass in the image: we want the 0, zeroth moment. The zeroth moment is the area or the total number of 1 pixels in the image. And it’s the value that we obtained previously. We compute the moment 1,0, and we do that again using the mpq function; pass in the image, 1 and 0. So we have this rather large number as a result. Compute the moment 0,1 … mpq … image … 0 and 1. Again we have a very large number. Now what we’ll do is compute the centroid of the region. So the u coordinate we’ll use, ucen, is equal to m,1,0 divided by m,0,0. And the v coordinate of the centroid is the moment 0, 1 divided by the zeroth moment, and that is the vertical coordinate of the centroid.

Let’s plot that on the image; we use the plot_point function. Pass in the u coordinate and the vertical coordinate, turn it into a column vector, and we’ll indicate this one with a circular marker. And here we have the centroid of all of the white pixels within the shark-shaped object. You can see that it’s very, very close to the centre of the box, and that says something about the fact that this shark object is fairly symmetrical.

Code

There is no code in this lesson.

If we look at a binary image we can easily see distinct regions, that is, sets of pixels the same color as their neighbours. We call these blobs and they’re an important way of achieving an object rather than pixel view of the scene. We can describe these blobs by their area, centroid position, bounding box and moments.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes an understanding of high school level mathematics; for example, trigonometry, algebra, calculus, physics (optics) and experience with MATLAB command line and programming, for example workspace, variables, arrays, types, functions and classes.

More information...

Rate this lesson

Average

Leave a comment

Previous lesson Next lesson