Specifying 3D tasks in terms of images


Specifying tasks in the image plane

The task that we want the robot to do is to position the end of this rod to the tip of this rod, and that’s a relatively easy thing for me to do. I’m using both my eyes to guide my hand in order to achieve that particular motion. However, if I’m only using one eye there is a potential for some ambiguity. Using just a single eye, it’s actually quite difficult for me to determine the distance along the axis away from my eye. In order to achieve this positioning task in three dimensions, I actually need to use two camera views and the cameras need to be at different viewpoints.
Here is a simple graphical representation of the problem that we just looked at. We have the blue object which is fixed, and we have the red object which is moving, and we want to bring the tip of the red object to the tip of the blue object. And this is clearly a task that occurs in three-dimensional space. Now, consider that we have a camera observing the task and we see here the camera’s view. We see the blue object and the red object. So when the task is accomplished, we see what it looks like in the camera’s eye view, and we see that in this two-dimensional projection of the task occurring that the tip of the red object is brought to the tip of the blue object. So the task is being achieved in the image as well as being achieved in three-dimensional space.

It’s not quite that simple.
Consider now that I perform the task incorrectly. I bring the red object to this position. Now, the task has not been achieved: the tip of the red object has not met the tip of the blue object in three-dimensional space. But in the image plane, the task looks like it has been achieved. We can say that in this horizontal direction, we can observe the motion of the red object. But in this direction away from the camera, we are unable to observe the motion of the object.

So we have an observable degree of freedom and we have an un-observable degree of freedom.

Another way we can think about this problem is that with the camera, we are making a measurement. It’s a measurement u,v: the coordinate of the red object in the image plane. So we’ve got two measurements. We are measuring two numbers, u and v. Then our task occurs in three-dimensional space: our task has got three degrees of freedom. So two measurements is insufficient to achieve a task with three degrees of freedom.
Now, let’s consider that we perform this with two cameras having a look at the task that’s being accomplished. These cameras have got different viewpoints. So if we perform a task, we can now see how it looks in each camera view. This time, the task is being achieved correctly. Now, we are making four measurements about what’s going on. We are measuring u and v in one camera. We are measuring u and v with another camera. Now, we’ve got four measurements, and that is sufficient to achieve a task with three degrees of freedom.

So in order to achieve a task, the number of measurements must be greater than or equal to the number of degrees of freedom.
A long time ago, we actually built a real robot system to do this. Now, the task was to insert a tool, which is shown here, in to a hole in the roof of an underground mining tunnel. There is a camera on each side of the robot’s tool tip and it’s observing the hole in the roof; and you can see that shown here as the green region. And what the control system is doing is to bring the tip of the tool to the centre of the green region simultaneously in both camera views. And if it does that, then it means that implicitly the tool has been put in to the hole in the roof of the mining tunnel.
Let’s recap on the difference between vision-based control and conventional robot control.

In conventional robot control, the observation of the pose of the robot’s end effector is indirect. We use sensors that measure the angles of the joints, and we use a kinematic model to compute where the end of the robot is. So it is just inferred.
We also need to be told the position of the object in Cartesian space and then the task of the controller is to bring the inferred position of the robot’s end effector to the position of the object which we know in advance.
Vision-based control by contrast is direct observation of the tool, and of the object, and of the error between those two things. In vision-based control, we try and drive that error to zero. We don’t actually know the pose of the object. We don’t actually know the pose of the robot’s tool. We simply observe the error and drive that to zero. This is a technique that’s commonly called visual servoing.
There are many potential uses of visual servoing. For instance, in a manufacturing operation, we may need to pick up objects that are randomly placed on a conveyor belt. Visual servoing could guide the tip of a robot to pick up these objects as they are moving past. It could also be used to grab objects which are swinging on an overhead transfer line for example. It could be used for a mating task, perhaps to put fueling nozzle in to a car, into a spacecraft, or an aircraft.

Consider a problem like fruit picking. We don’t know the x, y, and z location of every apple in the orchard. Perhaps, we don’t even know the location of the robot within the orchard. But using this visual control strategy, a robot can reach out and reduce the error between its tool tip and an apple to zero. In which case, then it has achieved a grasp of that fruit and can remove it from the tree.

We could consider the problem of landing an aircraft on a runway. We observe the runway. We can see some very obvious visual features there: some nice white lines, some nice white rectangles, and we could use those to control the aircraft so that its height above the runway becomes zero while being aligned with that runway.

It could be used for keeping an underwater robot at a fixed location despite an ocean current trying to push it away from where it wants to be. It could be used to guide an underwater robot following an underwater pipeline. It could be used for tasks like juggling and so on. The applications of vision-based control are almost limitless.

It is common to think about an assembly task being specified in terms of coordinates in the 3D world. An alternative approach is to consider the task in terms of the relative position of objects in one or more views of the task — visual servoing.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes high school level mathematics and requires an understanding of undergraduate-level mathematics; for example, linear algebra - matrices, vectors, complex numbers, vector calculus and MATLAB programming.

More information...

Rate this lesson


Check your understanding

Leave a comment