Vision-based Robot Control
Modern robots are very precise machines. They can be used for example to place chips onto circuit boards with great precision and at very great speed. But how do they actually do that?
The position of the robot’s end effector is never directly measured. What we do instead is measure the various joint angles and we use a forward kinematic model to determine the position of the robot’s end effector.
We assume that we know the position in Cartesian coordinates of where the robot’s tool tip needs to go and then it’s a simple matter of designing a controller that takes the tool tip from where it is to where it needs to go. And this is the fundamental principle underpinning all modern robots today. They are incredibly precise, they have got very accurate sensors, they’ve got very good kinematic models, and they are able to move the tool tip where it needs to go in the workplace.
Lets consider now what can go wrong with this approach to robot tool positioning.
Firstly; because the position of the tool is computed from joint sensors and kinematic models, if there are errors in any of the sensors, if there’s an error in the kinematic model, or perhaps there is an error in where we think the base of the robot is, then the position of the robot tool tip is going to be in error.
Perhaps there is an error in the location of the chip relative to the board, or perhaps there is an error in the position of the board with respect to the robot’s coordinate frame.
Finally there could be an error in the motion control system. It’s not able to drive the robot tool tip to where it needs to go to.
Nevertheless, over the long history of manufacturing robots these problems have largely been ironed out.
There is a consequence for this traditional approach to robot positioning. For a start we need to have very very accurate sensors on each of the joints. The robot links need to be very very accurately machined so they reflect the kinematic model embedded in the software.
The robot links also need to be very stiff so that as the robot changes its configuration the links don’t bend and deform under the influence of gravity. To make them stiff we have to use a lot of metal, which makes them heavy. In order for the robot then to move quickly we need to have much more powerful motors.
All of these issues drive up the cost of the modern precise industrial robot.
So lets assume we have a robot and that it is really precise. How do we actually put it to use?
For the robot to do useful things the object that it’s working with — its work piece — must also be precisely positioned. If the robot is able to go to position x,y,z within a small fraction of a millimetre, then the object that it’s working with also needs to be positioned with that same level of accuracy. When a robot is installed in a factory a large amount of engineering is involved in creating what we call jigs and fixtures. These are pieces of equipment that hold the robot’s work piece at a very precise location in the robot’s workspace. So when the robot uses its precision to reach to a particular coordinate the thing that it needs to work with is there.
Next issue is that the robot needs to know the position of the object within its workspace. We need to measure that very very accurately. In practice what happens is the work piece is held at a very consistent location in the workspace and we manually adjust the robot’s tool tip position to achieve the task. In this particular picture the operator doing this. The operator is holding in his hand what’s called a teach pendant. It’s a large box covered in buttons, and by pushing these buttons he is able to drive the robot’s tool tip to the required location within the workspace.
An alternative to this approach which relies totally on the inherent precision of the robot, is to have the robot use its sensors — for instance cameras which mimic the function of our eyes — in order to guide it to perform the task. And that’s exactly what we do; if I want to pick up this stapler I use my eyes to guide my hand so that the distance between my fingertips and the stapler become zero. Similarly if I want to perform a task like putting a key into a lock, I again use my eyes to guide my hand so that the tip of the key enters the barrel of the lock.
Here’s an example of this in practice. This is some work from my own PhD research way back in 1993. Here the robot arm is programmed to move upwards and downwards at constant velocity, but the horizontal motion of the robot is controlled by the shape of the line that it is observing. So in this sense the robot is reacting to what is in the world. So I can put any shaped line in front of the robot, and the robot arm would move and trace a path along that line.
In this more complex example, the robot is carrying a camera, and it’s trying to keep an object in the centre of its field of view. Now the object is on a turntable so it’s moving in quite a complex way. In fact it’s not moving at a constant velocity, it is accelerating, and here we see the robot capturing the object, locking onto it, and holding it in the centre of its field of view.
The same technique was then applied to tracking a ping-pong ball, which has been thrown in from one side of the scene, and this is in slow motion. Eventually the robot notices the ping-pong ball, locks onto it, and starts to move the camera in such a way as to keep the ping-pong ball closer to the centre of the scene. So it’s tracking the ping-pong ball as it is moving through space. And here it is in real time. We see the ping-pong ball moves very very quickly.
Here is some more old work when I was looking at controlling a mobile robot using the sense of vision. So this particular robot has got an omni-directional camera on the top, and in that omni-directional camera view we can find the edges of the roadway. It’s a very distinct boundary between the bright concrete road and the green grass and the brown dirt on either side of the road. What we want to do then is to keep the robot so it’s moving midway between the two boundaries of the road that we are able to observe visually.
Here is another example involving a mobile robot. In this case the robot is using its visual information — again from an omni-directional camera — to position itself with a fixed pose offset with respect to the traffic cone. So if I pick the traffic cone up and moved it to another place, the robot would move in order to maintain that fixed pose with respect to the traffic cone. So the robot is not navigating with respect to a coordinate frame attached to this location, it is navigating with respect to an object or a feature in its environment.
This is a vehicle, a forklift truck, and it’s got a hook on the front, and it’s trying to put that hook through the handle of the crucible. There is a camera on the forklift truck and it is looking down and using that to identify where the handle is, and controlling the position — the path of the vehicle — as it approaches the crucible. In this particular case we had to add some markers to the crucible to help the robot vehicle achieve the reliable pickup.
What you see now is a time-lapse movie. We ran this experiment over a whole day. Over the course of the day the lighting conditions changed: clouds came and went; shadows came and went. It showed the overall robustness and reliability of this vision-based strategy. The robot didn’t have any knowledge of where the crucible was; it just moved to reduce the area between the tip of the hook and the handle.
Here’s a very similar technique, in this case applied underwater. So this underwater robot has got a stereo camera pair and it’s maintaining a constant height above the seabed.
Building a highly accurate robot is not trivial yet we can perform fine positioning tasks like threading a needle using hand-eye coordination. For a robot we call this visual servoing.
This content assumes high school level mathematics and requires an understanding of undergraduate-level mathematics; for example, linear algebra - matrices, vectors, complex numbers, vector calculus and MATLAB programming.