MASTERCLASS

Further topics in image processing

Lessons

Transcript

Let's talk about how we find line features. In this example here we've taken a picture of a church and have overlaid a number of lines onto that image. These lines lay along the obvious boundaries in the scene.

The boundary between the sky and the roof, roof and the wall, edges and windows and so on. How do we go about doing this? The first step is we take our original input image, we convert it to a gray scale image and then we perform an age extraction operation.

We talked about age operators in a previous lecture. It involved essentially a convolutional process to determine image gradients and then we apply some threshold to that.

This image looks very impressive in a way we can clearly see that all the obvious lines in the image are highlighted here. But remember that this is just an image. It contains a whole lot of pixels. Some pixels are black and some pixels are white. There is actually no concept of line here.

Our eyes are seeing lines within the scene and our eyes and our brain are very, very good at doing this but there is actually no notion of lines here. It is simply a bunch of white pixels and black pixels and the white pixels highlight where there is significant gradient within the scene.

So how do we find the long line segments here? How do we group these white pixels into lines which we can describe by a simple equation. y=mx+b. The way we do this is with an image processing technique called the Hough transform and it's very, very genius and I want to try and explain it as simply as I can.

Let's imagine that we have one edge point. So this might be a pixel that's along the roof of the church that we were just looking at, a single pixel which lays on an edge.

Now there are infinite number of lines which can go through any point in space. This line for instance goes through the point, this line does not go through, this line does go through, this one doesn't, this one does, this one does, this one does and this one does.

Now we could consider that the lines vote for the point. This line doesn't go through the point so it's not allowed to vote for the point. It gets zero votes. Then this line gets zero votes as well; it doesn't go through the point. But this line gets a vote and so does this one, this one, this one and this one.

These lines voted for the point and the red lines have not voted for the point.

Now, lets consider we have another point which lays along the edge perhaps these two black points now lay along the roof of the church that we were just looking at. Once again there are an infinite number of lines which could go through that point. Could be this line, or this line, or this line and once again, we are going to allow the lines to vote for a point.

So this line gets one vote because it goes through a point, this line gets a vote as does this one and this one but this line also goes through that first point.

That particular line now has two votes. But we see that there are a lot of lines in this scene and I have already drawn up a small number. There are a infinite number of lines that go through the points, infinite number of lines that don't go through the points.

We can see that some of the lines that go through a point get one vote but one particular line, the line that goes through two points gets two votes. So we've used a simple voting system to try and determine the best line that goes through two points and we can extend this to a large number of points and that's what we would do when we process an image like the church. Normally when we represent lines, we use an equation like this. This is the very standard Cartesian representation of a line. U is our horizontal coordinate, V is our vertical coordinate, M is the gradient and C is the intercept. A problem with this particular representation of a line is for vertical lines. In the case where the line is vertical, it must be equal to infinity and that's slightly problematic.

There is another way we can describe the equation of a line and we do it in this fashion, it's often called the polar representation of a line that we have a U and V coordinates but the line is described by two parameters now row and theta. Theta is the angle that the line makes with the horizontal axis and row is the perpendicular distance between the origin and a point on the line. Two parameters row and theta can describe a line in a more versatile way than the traditional gradient and intercept formulation.

One problem is when we are talking about all those infinite number of lines that could go through a point is its going to be difficult for an infinite number of lines to vote. What we need to do is to quantize the values of row and theta. Instead of considering an infinite number of lines we consider a finite number of lines.

It works like this, it's a voting scheme now we have a coordinate system, horizontal axis is the line parameter of theta, the vertical axis is the line parameter of row. And we placed a grid on this. Every single cell represents a possible line in an image. A particular value of row and a particular value of theta and that represents a line in the image. This is a very course example in a real world example this grid might have hundreds of elements in both directions. The important thing to keep in mind is that each cell represents a particular line within the imagine.

For any point in the input image, that lays along an edge, that's a pixel coordinate U and V the parameters of the lines that go through that point must satisfy this particular equation.

This is now an equation that relates row to theta. For every possible value of theta, given that I know U and V, I can computer a value of row. And so all these lines get to have a vote. They get to vote one for that particular line. For this one point, it puts these votes into what we call the voting array sometimes called the half accumulator.

So for one point we have made votes for a large number of lines. Now let's consider what happens when we introduce a second point. The second point is voting for all of the possible lines that pass through it and it 's introduced another curve into the voting array. We can see that there are a number of possible lines that got one vote and we can see that there is one line here which has got two votes. This is the underlying principle for the Hough transform.

Code

There is no code in this lesson.

Many scenes, particularly of man-made environments, have very dominant lines due to the edges of objects. The Hough transform is a common technique for finding dominant lines, and we ill examine how it works and apply it to a real image.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes an understanding of high school level mathematics; for example, trigonometry, algebra, calculus, physics (optics) and experience with MATLAB command line and programming, for example workspace, variables, arrays, types, functions and classes.

More information...

Rate this lesson

Average

Check your understanding

Leave a comment

Previous lesson Next lesson