MASTERCLASS

Vision and motion

Lessons

Transcript

What I have here is a regular grid of dots. We are going to look at how those dots move in the image plane of a camera as the camera moves.

Here we have the camera looking at that grid of dots; and I am going to move the camera towards the left — that's the negative X direction — and we can say that the dots all appear to move in the positive X direction in the image plane. Come back to the middle and now I move the camera in the positive Y direction. We see the dots all appear to move upward in the frame; that's in the negative Y direction.

If I move the camera downwards towards the plane of the dots, we see that they all tend to be expanding outwards from the centre of the image plane. And finally, if I rotate the camera around its Z axis — what's called the optic axis — we see that the dots all appear to rotate around the centre of the image plane.

Consider that we have a camera looking at an object, in this case a ball.

We attach a coordinate frame to the camera in a way that we should be quite familiar with now. We call that coordinate frame C. Now we can define the position of the ball with respect to coordinate frame C.

Now we can consider that the camera is able to move, and it's moving in three-dimensional space so we can have potentially a velocity in the X, Y and Z directions, and it can rotate about the X, Y and Z axis.

We can project the three dimensional position of the ball into the image plane and we obtain the coordinate u,v.

Let's demonstrate some of these ideas in MATLAB.

We are going to create a camera object, and we've encountered this before. It's a pinhole camera with default parameters, and these are shown here. I am going to create a point in the world and it's going to be X coordinate of a metre, Y coordinate of a metre, Z coordinate of five metres. Transpose that so that it is a column vector. Now I can project that three-dimensional world point onto the image plane of my camera, and it is at this coordinate - 672 for u, 672 for v.

Now I am going to project the point again, but this time I am going to shift the camera.

I can do that with the ‘tcam’ option — the transform of the camera — and I am going to move the camera 0.1 metres in the X direction. So the ‘transl’ function we've encountered before creates a homogeneous transformation that represents a translation. There's no rotation in this translation: 0.1 in X direction, 0 metres in the Y direction, 0 metres in the Z direction.

Close that last bracket off, and here is the point.

I've moved the camera a small amount in the positive X direction, and we can see that the u coordinate has reduced. The point has moved toward the left in the image.

I can compute the sensitivity of this motion by taking the difference between the shifted point, and its original position, and dividing by the amount that I moved the camera. This sensitivity indicates that for every metre that I move the camera in the positive X direction, the projection will move -160 pixels on the image plane.
We can repeat this process for other kinds of camera motion.

Here is our projection function of the world point, and I am going to now translate the camera in the Z direction. Subtract the original position and divide by 0.1, and here we see that if I move the camera 1 metre in the Z direction, this particular point will move 32 pixels in the u direction, 32 pixels in the v direction. I can replace that translation with a rotation. Let's rotate the camera, say 0.1 radians around the X axis and repeat, and in this particular case the projection of the point will move by 40 something pixels in the u direction and 850 pixels — a significant amount of motion in the v direction. 

What we've seen is the relationship between how I move the camera and how the projection varies on the image plane.

A more intuitive way to visualise what's going on here is to display the flow field, and the flow field is defined for this particular camera and I pass in the velocity of the camera. So in this case I am going to parse in the spatial velocity of the camera, so that's a six vector and it has got unit velocity in the X direction. The flow field looks like this. It's saying that if the camera is moving in the positive X direction, then every point in the image will be shifted towards the left. If I change the camera velocity to be unit velocity in the Y direction, then the flow field looks like this. If my camera is moving in the positive Y direction, which is the way I defined my camera coordinate frame — the camera is moving downwards in the world — then every point in the image appears to move upwards.
 
Now let's change the velocity to being unit velocity in the Z direction. Imagine the camera is moving towards the scene. Now the flow field looks like this. We have this very distinctive radiating pattern coming from the centre of the image plane.

The final example that I want to show you has got zero translational motion and it has got unit rotational motion around the Z axis. I'm rotating the camera and in this case, every point in the scene appears to rotate around the centre of the image plane.
 
Let's consider that we have a regular array of points and the camera is moving in the positive X direction. It's got a positive X velocity component. What we observe is that all the points in the image move uniformly toward the left. Now let's consider a more interesting example where the camera is moving forward. And now what we see is this interesting radiating effect, a bit like the Star Trek warp speed effect.

We see that points towards the edge of the image are moving faster than points in the middle of the image. The centre of the image we call the focus of expansion. It's the point from which all of the pixels appear to be radiating from. Consider now that we rotate the camera in a positive direction around the Z axis. If I’m holding the camera, that will correspond to me rotating the camera in a clockwise direction. The pattern of pixel velocities is now quite different. We see that each pixel's velocity is tangential to a circle centred on the middle of the image.

So far, we've considered a fairly contrived example. We've imagined this camera moving towards a uniform array of dots. The real world is not like that, but we can compute this optical flow phenomena for real image sequences. Here we see some imagery which is captured from an unmanned aerial vehicle, a flying robot if you like. The robot is flying at pretty much constant altitude, and moving in very much a straight line.

Each of these green arrows indicates the direction that a point in the image is moving from one frame to the next. This pixel motion has been greatly exaggerated from frame to frame. A point is only going to move a few pixels. Although the average pixel velocity is downwards within the image, we can also see that there is some side to side motion, and this would indicate then that the robot is not exactly flying a straight-line path - perhaps is being buffeted by wind gusts which is causing its heading angle to change a little bit.
 
Another example, this one which is more three-dimensional, is taken from a camera on a car moving along a road, and there are a lot of parked cars at the side and there are a lot of trees. What you'll notice in this particular case is that objects that are far away cause relatively little optical flow, where objects that are close by cause a large amount of optical flow, whereas objects that are close by cause a large amount of optical flow.

We can see this radiating optical flow pattern again. We can see that the vectors all appear to radiate from a point in front of the car, which we call the focus of expansion. That's the point in the world that the car is heading towards. Another thing we will notice occasionally is that the optical flow vectors bounce up and down crazily, and that's because the car was driving over a speed bump. So the car is pitching up and then pitching down. That causes an additional motion component, which is superimposed on top of the optical flow due to just pure forward motion.
 
This is a very very powerful illusion. I am inside a rotating drum and my eye can detect the motion of this drum. My eye and my brain are computing what we call optical flow, and this optical flow that's caused by that rotating drum, is the same sort of optical flow that I get if I rotate my head this way, or I rotate my head that way. And this particular illusion causes me to feel a little bit uneasy because the information that I get with my eyes — the optical flow — tells me that my head is moving from side to side but the gyroscopic sensors in my ears tell me that that's not happening. So there is a disconnect between what my ears are telling me about my attitude, and what my eyes are telling about my motion, and that leads to the sensation of sea sickness or slight nausea, which is why I am hanging on very tightly to these rails.

Code

There is no code in this lesson.

When a camera moves in the world, points in the image move in a very specific way. The image plane or pixel velocity is a function of the camera’s motion and the position of the points in the world. This is known as optical flow. Let’s explore the link between camera and image motion.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes high school level mathematics and requires an understanding of undergraduate-level mathematics; for example, linear algebra - matrices, vectors, complex numbers, vector calculus and MATLAB programming.

More information...

Rate this lesson

Average

Check your understanding

Discussion

  1. Zain says:

    Is it possible to invert projection matrix to obtain real world coordinates from image coordinates. Specially when the point is non planar ?

    1. Peter Corke says:

      Unfortunately it’s not that simple? For a start the matrix is not square. But the real problem is that all points on a line through the focal point are mapped to a single point on the image plane. There is a many-to-one mapping from world points to the image plane, so the inverse is at best a line, not a world point.

  2. Aria says:

    The last part of the video made me feel dizzy. I have to use paper to cover all the rotating parts, so I can concentrate on what the professor said. Haha.
    Great video, thank you, Professor Corke!

Leave a comment

Previous lesson Next lesson