If Tesla’s Dream Of Making Cameras Perform As Well As LIDAR Comes True, It May Help Tesla’s Competitors More
Elon Musk's view on LIDAR (The 3-D image technology) for self-driving is well known. He does not plan to use it at Tesla, and calls it a crutch.
One approach Tesla has promoted is what is sometimes called pseudo or virtual LIDAR. This involves building tools to take camera images (either stereo or regular) and figure out how far away each pixel in the image is. A LIDAR figures out the distance to every pixel by actually timing how long the light pulse takes to hit it and come back at the speed of light. Human beings, on the other hand, estimate the distance by using our brains. We know how big things are and how they move and that gives us close to how far away they are. We also use a few other tricks, like the stereo vision that comes from having two eyes, but that only works out to a moderate distance. Another good trick is “motion parallax” where you track how things move against the background and other things, which gives you other clues.
That’s all great, and human brains are up to this task — in fact you can do it with one eye closed pretty well while driving. People are trying to build machine learning techniques using neural networks to also figure out distance from an image. This is a virtual LIDAR. One example research result is here.
Training your virtual LIDAR is much easier than training many neural networks. Normally training requires you have lots of training images where humans have painstakingly figured out the true distances. Since a test car can have an actual expensive LIDAR on it, you can drive around to get training data combined with the “ground truth” distance figures learned from the LIDAR. You show the neural network tons of images with the real distance as calculated by the LIDAR, and it becomes good at trying to figure out the distance on its own. This technique, a variant of “unsupervised learning” because you don’t need human taggers, is vastly cheaper than supervised learning, so if there is something neural networks can do well at, it should be this. You can also train on simulator data to improve your models.
Another useful training technique is just relying on the fact that real world objects change distance in predictable ways. When you see an object moving along a path allowed by physics, your estimates were very likely correct. If you see the object jump around in space in impossible ways, you know they were wrong.
They’re doing OK. One problem with neural nets is they tend to look at single frames, not a moving image the way humans do. Humans actually make a lot of mistakes on still images. In time, the machine learning techniques may get past this. The problem is we must get a “bet your life” reliability out of it. You also need to do it on things you have never seen before, something that can challenge neural networks. An example is something unusual stopped in your lane on the road ahead. You need to know how far away it is, and you need to find out super reliably and soon. If it’s a car, you know how big cars are, so you know how far away it is. Ditto a car spun sideways — for a human — but a training database may have never seen that. For a random object, you wonder, is that a big object far away, or small object close by? The only way to tell is by seeing its relationship with the geometry of the road. It’s more complex.
If somebody pulls this off, they will have a tool that can take camera images and produce the 3-D “point cloud” that a LIDAR produces, and, since cameras are cheaper, it will do it at much lower cost. They may also be able to do it at very long range. Many LIDARs only see out to about 120m. Fancy ones see 240m. Humans are known to understand what they see a mile down the road.
Here’s the irony. The developers that have committed to LIDAR have built their systems to depend on these point clouds, and have spent a lot of time refining that. If a pseud0-LIDAR system suddenly becomes available that produces quality point clouds, they can make use of it immediately. Those who have been hoping for the pseudo-LIDAR will not have the same experience in using data in that form. Instead, they will have plans to combine the other elements of their vision system (segmentation of the image into different objects, and classifying what they are) with distance estimation. They might be less equipped to use the breakthrough they have been hoping for.
The LIDAR using companies, on the other hand will just say “great, we can replace the expensive LIDAR with something cheaper.” If they are also LIDAR-making companies (like Ford, Cruise, Waymo and Aurora) they may feel they have wasted some money.
What is clear is that you need to learn the distance to everything on the road, and you have to figure it out correctly, and you have to do that fast. We’ve already seen Tesla autopilot crash several times into trucks, a crash barrier and stalled vehicles in the lane ahead that were hidden by a car that suddenly pulled out of the way. When an obstacle on the road is revealed to your sensors suddenly, by surprise, you need to know how far away it is with high reliability, so you can initiate emergency braking. LIDAR pretty much always does this, but computer vision does not. Pseudo-LIDAR is an effort to fix that problem — but for now most other companies plan to solve it with LIDAR, which they know works, and which they expect to see become cheap.
Of course, if Tesla is the one to solve this problem internally, it won’t share it with others (though the demonstration may trigger other companies to do the same thing.) A perception team may also try to develop a tool that just tries to match distance estimation with classification, rather than produce a LIDAR style point cloud. This would not be pseudo-LIDAR but would be equally useful if universally accurate.