Summary of Robotic Vision


We have covered a lot of topics. We have talked about how the sense of vision evolved. We have talked about the details of human vision. We have looked at some of the variety of animal vision. We have talked about how vision is a process where the eye is actively steered around the scene in order to maximise our understanding of that particular scene.

So now it is appropriate to talk about robots and vision, and look at how robots have used the sense of vision in order to be able to perform tasks, in a similar way to which humans and animals use vision to perform tasks.

The robot on the left is a very famous robot called Shakey. Developed at SRI International in the 1960s. And it has a television camera on the top, which it uses for navigation. The robot on the right is one that I built in the early 2000s, and it has got quite a number of cameras on board, and they are a little bit interesting to look at. In the front is a stereo camera pair and it is a bit like our own two eyes. It allows the robot to determine three dimensional structure of the world in front of it. On the top, it has a very shiny mirror object and that is part of what we call a panoramic camera assembly. There is a camera looking up at that mirror and that allows the robot to see 360 degrees—it can see forwards, backwards and side ways. It’s sometimes called an omni-cam or a panoramic camera.

The last camera on this robot is a wide-angle camera on the side. It has a fish eye lens, so it almost has a hemispherical field of view. And that can look at what is happening to the side of the robot.

The current Mars Rover, the Curiosity Rover, has got a large number of cameras on board that it uses for a variety of functions.

Now the business of teaching machines to see is the field of computer vision, and that is an activity that has been going on since perhaps the 1960s. Some important early work was carried out by Larry Roberts at MIT as part of his PhD project. Larry went on to do other very important things and was very instrumental in creating the internet. So in his theses work what he was doing was taking a picture of an object, and this is a very simple wooden block object, took a picture with a TV camera and was trying to work out what shape it was. So he went through a number of vision processing steps and these are the sorts of things we will cover in following lectures.

One of the first steps was to find the edges of this object and once he had found the edges of the object he would try to fit line segments to those edges and once he knew those line segments, he knew something about the way images are formed. He could then say something about the three dimensional shape of this particular object. So what we have here is a system that can take an image and process it and come up with a three dimensional model of that shape. So it is a very simple form of object recognition. The computers take an image and made a discussion about what sort of object that it was looking at. That’s sufficient information then for a robot to make a move, go and pick that object up and perhaps manipulate it in some way.

So why is vision a good sensor for a robot to have? There are a few reasons why I believe vision is an important and very practical sensor for a robot.

Firstly, the cameras themselves are now very cheap and the reason for this is because cameras are built into everything; built into cell phones, built into laptops, and so on. So the actual sensor the equivalent of a retina is now a device that perhaps costs less then a dollar. Lenses are smaller, cameras are small and cheap.

The other reason that is really important is that computation is now really cheap; we have very powerful computer chips and lots and lots of memory, and so this enables us to run algorithms to process the data that comes out of the sensor chip.

So this combination of very effective, high resolution, colour, cheap sensors, with abundant computation are the foundations on which we can build robot vision systems.

Here is a really interesting graph from Ray Kurzweil, and he talks a lot about the way computation power has changed over time, and this is a logarithmic vertical scale and his plotting a number of data points that represent a number of calculations per second you can buy for a thousand dollars over time. And we can see that this is an exponential plot on a logarithmic vertical scale. So what it is showing is computation is really increasing, increasingly rapidly with time. And this is fundamentally Ray Kurzweil's theses.

So if we extrapolate this into the future, we can see that we are about here, and we have computational power of effectively one mouse brain.

By the early 2020s we should have, for a thousand dollars, the computational power of a human brain. And by 2050 when many of you will be alive, perhaps at the ends of your working careers, for a thousand dollars you will be able to buy enough computing power to the equivalent of all human brains on the planet. That is a pretty amazing prediction and so clearly very, very exciting times ahead.

So what practical things have roboticists been doing with sensor vision?

We hear a lot in recent times about self-driving cars, Google cars and so on, but the actual history of self-driving cars goes back a long way. There was some very significant research program in Europe in the 1980s called Prometheus, and a lot of very fundamental work was done by a scientist called Ernst Dickmanns. He automated this van. It had a number of cameras in the front looking outwards, and it was able to drive along autobahns at high speed. A number of significant landmark achievements were made by this particular van and some of it immediate descendants.

Another landmark achievement was from some researchers from Carnegie Mellon University, who automated a car primarily using the sensor vision, and drove it across America. The journey took a few days, around four thousand kilometres, and there were relatively few human interventions.

This video shows a humanoid robot catching a ball. Here we see it again in slow motion. Now the robot has a pair of cameras in its head, which allows it to estimate the distance to the ball; uses that information to model how a ball moves through space in order to plan the motion that the arm should take to intercept the ball. So the robot’s head has a number of sensors: pair of cameras as I mentioned, but also some tilt sensors, so it can work out where the head is pointing in space. Now you can see the positions of the ball as seen by its left and right eye as a function of time, and here you can see an animation of the robot’s hand moving to intercept the path of the ball.

Here is a flying robot which we actually looked at up close in the Out and About with Robots video, in the Robotics course some of you might have seen earlier. This robot is equipped with a stereo pair of cameras again. So again like our own two eyes, this enables the robot to sense an obstacle in front of it, by working out the three dimensional structure of the world in front using information from two cameras and a fair amount of processing on board.

Another robot developed by myself and some colleagues at CSIRO, and this underwater robot has also got a stereo pair of cameras. It has got two cameras that look downwards and two cameras which look frontwards. The downward looking cameras are estimating the distance of the seabed from the robot, and this robot is trying to maintain a constant altitude above the sea bed. And it does that using the three dimensional information from the downward looking stereo cameras. The frontward looking stereo cameras are used to detect obstacles.
Here we see what the world looks like through the eyes of a mobile robot. This particular robot has got a stereo camera pair which allows it to create a three dimensional model of the world through which it is moving. And that is really useful in order to determine what is a flat surface that it could drive over, and what is a wall, or human being, or some other kind of obstacle.

Here we have something a bit different. Here we have a single camera looking downwards at a coral reef, and the camera is being carried by a robot. Now we are able to use a number of mathematical techniques to combine the information from these multiple camera videos to create a three dimensional model of the coral reef. We do this from a number of single camera views. Now we smooth that three dimensional mesh; we drape the original imagery over it to create a texture map surface. So now we have a very realistic looking three dimensional model of a coral reef obtained just from a whole sequence of single camera views.

So I hope that I have convinced you that vision is a really, really important sensor for all sorts of animals, for ourselves and also for robots. So in the rest of the course we are going to learn something about how we do robot vision; how do we take information from camera sensors and process it and generate information that a robot can take some action on.


There is no code in this lesson.

Vision is useful to us and to almost all forms of life on the planet, perhaps robots could do more if they could also see. Robots could mimic human stereo vision or use cameras with superhuman capability such as wide angle or panoramic views.

Professor Peter Corke

Professor of Robotic Vision at QUT and Director of the Australian Centre for Robotic Vision (ACRV). Peter is also a Fellow of the IEEE, a senior Fellow of the Higher Education Academy, and on the editorial board of several robotics research journals.

Skill level

This content assumes only general knowledge.

More information...

Rate this lesson


Check your understanding


  1. Brent says:

    I thought this would have been a deep-dive into how robots see. That’s what I was hoping for, anyway. I do love your videos though.

    1. Peter Corke says:

      Thanks for the comment. This is a third year university course and is laying the foundations for how robots see.

Leave a comment