Summary of Vision and Motion


Let's summarise some of the main points from this lecture. 

We talked about how a robot knows where objects are in its environment. A robot doesn't actually measure the position of its tool tip, it is inferred from joint angle measurements and from a kinematic model.  It knows the Cartesian coordinate of where the tool tip needs to go in the world, and it’s the job of the motion controller to take the tool tip there.

A number of things can go wrong with this process; there could be errors in the sensors, there could be errors in the kinematic model, we may not know the base position of the robot accurately, we may not know the location of the circuit board very accurately.  All of these issues have consequences on the way we construct a modern industrial robot and they generally tend to lead to increased cost. 

We can contrast that with the way human beings solve problems; for instance threading a needle. I use my eyes to guide or to steer the end of the thread through the hole in the needle. 

If we consider this problem a bit more schematically where we're trying to align the end of two objects, we can consider the task in the three-dimensional Cartesian space and we can also look at that task with a camera. We can look at a two-dimensional representation of that task being achieved.

We talked about how using information from just a single image actually provides insufficient information to reliably perform this task in three-dimensional space. But we can actually perform the task simultaneously in two camera views, aligning the red and blue objects, and if that happens, then in the real three-dimensional world the red and blue objects will also become aligned. 

We talked about what happens when we move a camera.  The camera's got six degrees of freedom; its velocity can be described in terms of three translational velocity components and three rotational velocity components.  If we imagine that the camera is looking at a regular array of points a constant distance away, then each of the camera velocity components results in a fairly unique pattern of motion of those points on the image plane. And here we can see the patterns of motion, which are referred to as optical flow, due to velocity in the X direction, velocity in the Z direction or rotational velocity around the Z axis. 

The patterns are fairly distinct but some of them are a little bit similar, and in particular velocity in the X direction is quite similar to the pattern of motion induced by rotation around the Y axis; they look quite similar, but the amount of similarity actually depends on the focal length of the lens.  For a large focal length, the ambiguity is quite pronounced.  As the focal length gets smaller, the ambiguity is much less evident.  This ambiguity between vX and omega-Y, we also see between vY and omega-X, there's a symmetry there. 

The relationship between the camera velocity which is described by a six vector and the velocity of the point on the image plane, is described by the image Jacobian matrix.

The image Jacobian matrix tells us, for a particular camera motion, how points will move on the image plane.  We can turn this around and saying instead we want to have a particular velocity on the image plane in order to move a shape from perhaps an initial view to a desired view.  So now we have the image plane velocity, we can invert the relationship and determine the camera velocity that we need in order to achieve our desired view of the object. 

The image Jacobian relates pixel velocity to camera velocity, and the image Jacobian itself is a function of many parameters. It depends on the coordinate of the point on the image plane, it depends on how far the point is away from the camera in three-dimensional space and it depends on the focal length.  We can re-arrange this equation in many different ways; perhaps we know pixel velocity and Z, and u and v, and then we can determine what the camera velocity is.  There are many tricks that we can play with this image Jacobian matrix.

Let’s recap the important points from the topics we have covered in our discussion of optical flow and visual servoing.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes high school level mathematics and requires an understanding of undergraduate-level mathematics; for example, linear algebra - matrices, vectors, complex numbers, vector calculus and MATLAB programming.

More information...

Rate this lesson


Leave a comment