Perception in snowy conditions for autonomous cars

When storm Emma started forming in late February 2018 over South England, we weren’t expecting 22 inches of snow to cover the grounds over just a week. As much of an inconvenience as it was, for the general population of UK, we considered it a unique and timely opportunity to test the quality of our world beating Vision AI system for autonomous car perception.

We wanted to use the most basic sensing capability (an off the shelf consumer grade camera) to test our system in the most challenging driving conditions. As we all know very well, driving on snow covered roads is a huge challenge for human drivers due to the tricky road conditions where salting is infeasible, such as residential neighbourhood roads and rural lanes. We set ourselves the goal of testing our Vision AI system for detecting the ground surface and segmenting the drivable free space in the most challenging set of conditions — snow covered residential neighbourhood roads and rural lanes. This meant, we would never be able to see the full road surface clearly, most of the road and lane markings would be snowed over, there would be slush on the road with lots of tyre tracks, we wouldn’t be able to see the road curbs and land edges, and most everything on the ground would look white.

This is probably one of the hardest set of conditions one can throw at a perception engine to detect where the road surface is and where can the autonomous system drive. Our Vision AI has two key features that make it unmatchable and beyond the state-of-the-art in autonomous perception. First, Vision AI is a generalisable perception system that works out of the box. You turn on the system and it starts to do what it is supposed to do without the need for any data driven training. Second, it is highly sophisticated in its technical capabilities to detect and segment the ground surface and drivable free space in conditions where humans need to make inferences and guesses about where the ground might be — for example when we are unable to see the road clearly due to snow cover, we tend to follow the tracks left by road users who have driven before us, without needing to see the entire road surface. For an autonomous perception system to be able to replicate this performance requires technically very advanced capabilities.

To our delight, not once did the Vision AI let us down. We drove over the entire period from Emma’s forming to dissipation (nearly 6 days) and clocked over 250 miles of driving and perception data collection and Vision AI performed like an expert road surface detector, clearly segmenting round-about junctions, partly occluded lanes due to parked vehicles, slush, driving tracks of black on otherwise a uniformly white surface.

When you see the video clips of some of the footage of Vision AI at work, you will notice how clean and accurate the performance is. The conditions of the surface are feature sparse, means there’s isn’t much to detect and make sense of. Yet the system provided a very high fidelity output. We keep an eye out for how the field of autonomous perception is advancing and keenly review the release of video footage put out in the public by our peers in the industry. We won’t be off the mark if we said that this is a ‘world first’ in terms of what’s out there as evidence of the state and technical sophistication of autonomous perception capabilities.

We have broken new ground in pushing the technical boundaries and have been constantly refining the capabilities of Vision AI through out this year. We are hoping that UK might give us another opportunity this year where we get to test the advances we have achieved in Vision AI performance in the last 8–10 months.

What really is “Perception” for autonomous vehicles

Perception is the term used to describe the visual cognition process for autonomous cars. Perception software modules are responsible for acquiring raw sensor data from on vehicle sensors such as cameras, LIDAR, and RADAR, and converting this raw data into scene understanding for the autonomous vehicle.

                                           Raw pixel data fed as input to perception
                                      Scene understanding derived from perception

The human visual cognition system is remarkable. Human drivers are able to instantly tell what is around them, such as the important elements in a busy traffic scenario, the locations of relevant traffic signs and traffic lights, the likely response of other road users, alongside a plethora of other pertinent information. The human brain is able to derive all of this insight using only the visual information being acquired by our eyes in split second time. This visual cognition ability extends in a generalised way across numerous types of traffic scenarios in different cities, and even countries. As human drivers, we can easily apply our knowledge from one place to another.

However, visual cognition is incredibly challenging for machines, and the idea of building a generalisable visual cognition is currently the biggest open challenge within the fields of autonomous driving, machine learning, robotics, and computer vision. So, how does perception work for autonomous cars?

Perception technologies can be broken down into two main categories, computer-vision approaches and machine learning approaches. Computer vision techniques seek to formally address problems by using an explicit mathematical formulation to describe the problem, and usually rely on a numerical optimization to find the best solution to the mathematical formulation. Machine learning techniques on the other hand, such as convolutional neural networks (CNNs) take a data-driven approach, where instead, ground-truth data is used to ‘learn’ the best solution to a particular problem by identifying common features in the data associated with the correct response. For example, a CNN trained to identify pedestrians in camera images will extract features that are commonly present in the training data associated with the appearance of pedestrians, such as their shape, size, position, and colour. Both approaches have their merits and disadvantages, and autonomous vehicles rely on a combination of these techniques to build a rich scene understanding of their environment.

Perception is very challenging for autonomous vehicles because it is incredibly difficult to build a generalisable and robust model to describe complex traffic environments, either explicitly or through data. Autonomous vehicles can encounter strange and previously unseen obstacles, new types of traffic signs, or obstacles of a known type in a strange configuration such as a group of children wearing Halloween costumes.

                                                            Challenging obstacles

Similar challenges are present in identifying where it is safe to drive. Deriving a safe driving corridor is fairly straightforward in the presence of well-maintained lane markings on roads that an autonomous vehicle has frequently driven on. But performing the same task on a new road without lane-markings, or a different style of lane markings is a much tougher challenge. There is huge variety in road geometry and road surface types across the world, from motorways to dirt roads, and for a truly automated future, autonomous vehicles will have to be able to contend with all of these conditions.

                                                               Challenging roads

The challenge of perception is further compounded in adverse weather or at night time, where raw sensor data becomes degraded and the perception system needs to parse noisier data to make sense of what is in the environment.

                                Difficulty of perception in low light and adverse weather

Computer vision-based perception approaches usually have a fair performance and are typically generalisable across a wide set of scenarios and conditions, depending on the robustness of the underlying mathematical formulation. On the other hand, machine learning-based approaches are limited based on the data used to train the system, and whilst good performance is achieved if real-world conditions match the training data, performance degrades significantly when the real-world looks different to what the machine learning system has been taught to recognise. This then begs the question that if perception is so challenging, and computer-vision and machine learning have limitations in performance and generalisability, how are autonomous cars today able to contend with real-world driving scenarios. The answer – Mapping. Autonomous cars take the burden away from on-vehicle perception by using a prior 3D survey of roads with annotations identifying important road features. This 3D map, sometimes referred to as a high-definition (HD) map, contains detailed information about each centimetre of every road an autonomous vehicle will operate on, including the precise position of lane markings, curbs, traffic lights, traffic signs, buildings, and other environmental features. By utilising an HD map, autonomous vehicles only need to perceive dynamic elements of a scene, such as pedestrians, other vehicles, and cyclists, for which CNNs are well suited and provide good performance under most scenarios. Computer vision can then be relied upon as a redundant perception technology in case a CNN failure occurs because a strange obstacle is present or an unknown scenario develops.

However, a simple question then comes to the fore, what happens if autonomous vehicles don’t have access to HD maps, or HD maps are outdated. How can an autonomous vehicle drive in these scenarios when it has to only rely on its on-board perception?

At Propelmee, our technologies answer these questions…