Cars using artificial neural networks have no memory of the past and are in a constant state of seeing the world for the first time – no matter how many times they’ve driven down a particular road before.
The researchers have produced three concurrent papers with the goal of overcoming this limitation. Two are being presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022), being held June 19-24 in New Orleans.
“The fundamental question is, can we learn from repeated traversals?” said senior author Kilian Weinberger, professor of computer science. “For example, a car may mistake a weirdly shaped tree for a pedestrian the first time its laser scanner perceives it from a distance, but once it is close enough, the object category will become clear. So, the second time you drive past the very same tree, even in fog or snow, you would hope that the car has now learned to recognize it correctly.”
Spearheaded by doctoral student Carlos Diaz-Ruiz, the group compiled a dataset by driving a car equipped with LiDAR (Light Detection and Ranging) sensors repeatedly along a 15-kilometer loop in and around Ithaca, 40 times over an 18-month period. The traversals capture varying environments (highway, urban, campus), weather conditions (sunny, rainy, snowy) and times of day. This resulting dataset has more than 600,000 scenes.
“It deliberately exposes one of the key challenges in self-driving cars: poor weather conditions,” said Diaz-Ruiz. “If the street is covered by snow, humans can rely on memories, but without memories a neural network is heavily disadvantaged.”
HINDSIGHT is an approach that uses neural networks to compute descriptors of objects as the car passes them. It then compresses these descriptions, which the group has dubbed SQuaSH (Spatial-Quantized Sparse History) features, and stores them on a virtual map, like a “memory” stored in a human brain.
The next time the self-driving car traverses the same location, it can query the local SQuaSH database of every LiDAR point along the route and “remember” what it learned last time. The database is continuously updated and shared across vehicles, thus enriching the information available to perform recognition.
“This information can be added as features to any LiDAR-based 3D object detector;” said doctoral student Yurong You. “Both the detector and the SQuaSH representation can be trained jointly without any additional supervision, or human annotation, which is time- and labor-intensive.”
HINDSIGHT is a precursor to additional research the team is conducting, MODEST (Mobile Object Detection with Ephemerality and Self-Training), that would go even further, allowing the car to learn the entire perception pipeline from scratch.
While HINDSIGHT still assumes that the artificial neural network is already trained to detect objects and augments it with the capability to create memories, MODEST assumes the artificial neural network in the vehicle has never been exposed to any objects or streets at all. Through multiple traversals of the same route, it can learn what parts of the environment are stationary and which are moving objects. Slowly it teaches itself what constitutes other traffic participants and what is safe to ignore.
The algorithm can then detect these objects reliably – even on roads that were not part of the initial repeated traversals.
The researchers hope the approaches could drastically reduce the development cost of autonomous vehicles (which currently still relies heavily on costly human annotated data) and make such vehicles more efficient by learning to navigate the locations in which they are used the most.
For more information, see this Cornell Chronicle story.