How Autonomous Vehicles Work — Sensors, Perception, and Path Planning

1. SAE Autonomy Levels

L0

No automationHuman controls everything. Warnings only (lane departure alerts).

L1

Driver assistanceSingle system: adaptive cruise OR lane centring (not both). Human monitors.

L2

Partial automationBoth steering and speed (e.g. Tesla Autopilot, GM Super Cruise). Human must monitor and be ready to take over.

L3

Conditional automationSystem drives; human may engage other activities but must be available on request. Honda Legend, Mercedes-Benz Drive Pilot (approved in certain geofences).

L4

High automationNo human required within defined operational design domain (e.g. Waymo One robotaxi in Phoenix). Cannot handle all conditions.

L5

Full self-drivingPerforms all tasks anywhere, any conditions. Not yet commercially deployed as of 2025.

2. Sensors: Eyes of the Car

Sensor	Range	Strengths	Weaknesses
LiDAR	0.1–200 m	Precise 3D point cloud, works in dark, no motion blur	Expensive, rain/snow scattering, no colour
Radar	0.5–300 m	Works in fog/rain, directly measures velocity (Doppler), cheap	Low resolution, reflective clutter
Camera	0.1–150 m	Rich semantic info, works at traffic light range, cheap	No depth, sensitive to lighting and glare
Ultrasonic	0.1–8 m	Cheap, very reliable near-field detection	Very short range only
High-precision GPS	Global	Absolute position (cm-level with RTK)	No signal in tunnels/dense urban canyons, 100 ms latency

Tesla vs. Waymo philosophy: Tesla uses camera-only ("Tesla Vision") — cheaper and scales better. Waymo uses LiDAR + radar + cameras. Both approaches have tradeoffs; the debate is ongoing in the industry.

3. Sensor Fusion

No single sensor is reliable in all conditions. Sensor fusion combines all inputs into a coherent world model. Common approaches:

Kalman Filtering / Extended Kalman Filter (EKF): Optimal linear estimator that maintains a probability distribution over the vehicle's position and velocity, combining motion model predictions with noisy sensor updates.
Particle Filter: Non-parametric Bayesian filter — represents distributions with samples. Better for non-linear, non-Gaussian uncertainty. Used for localisation on LiDAR maps.
Deep learning fusion: Modern systems (Tesla FSD, Waymo 5) fuse sensors directly in the network — raw sensor data in, 3D scene representation out.

4. Localisation and SLAM

GPS alone is insufficient — 3 m accuracy error is dangerous in a traffic lane. AV systems achieve centimetre-level accuracy by matching real-time sensor data against a High-Definition (HD) map: LiDAR point clouds are matched to a pre-built map using algorithms like ICP (Iterative Closest Point) or LOAM (LiDAR Odometry and Mapping).

In unmapped or changed environments, SLAM (Simultaneous Localisation and Mapping) builds the map and localises the vehicle at the same time — a chicken-and-egg problem solved with probabilistic graph optimisation (pose graph SLAM, iSAM2).

5. Perception: Seeing Objects

Perception converts raw sensor data into a list of detected objects with class, position, size, and velocity. Current approaches:

3D Object Detection from LiDAR: PointPillars, VoxelNet — voxelise the point cloud and apply 3D convolutions.
Camera-based detection: BEVFusion, BEV-Former — project images into a Bird's-Eye View (BEV) feature grid using transformers. Enables depth estimation from monocular cameras.
Segmentation: Pixel- and point-level classification. Necessary for drivable area detection, lane recognition.
Occupancy grids: Probabilistic map of which 3D voxels are occupied by any object — safer than explicit object lists for handling "unknown unknown" objects.

6. Prediction: What Will Others Do?

Knowing where other agents are is not enough — we need to know where they will be in the next 5–10 seconds. This is motion prediction.

Classical approaches used kinematic models (constant velocity, constant turn rate). Modern AV systems use transformer-based prediction (Waymo Motion, MTR):

Input: past trajectories of all agents + map topology (lane geometry, traffic lights).
Output: K multimodal future trajectories with probability scores for each agent.

Multimodal output is critical — a pedestrian might either cross the road or turn right. The planner must handle all plausible futures.

7. Motion Planning

Motion planning finds a collision-free, comfortable trajectory from the current state to a goal. Key layers:

Route planning: High-level navigation (A* over road graph).
Behavioural planning: Decides when to change lanes, yield, merge. Often rule-based or learned policies.
Trajectory planning: Generates a smooth, dynamically feasible path at ≤100 ms. Methods:
- Polynomial spirals / spline optimisation (Werling et al. Frenet frame planner)
- Rapidly-exploring Random Trees (RRT*)
- Model Predictive Control (MPC) — optimises over a short horizon considering dynamics

End-to-end learning: Tesla FSD v12+ and Waymo are increasingly replacing the classical modular pipeline with large neural networks that go directly from sensor inputs to control actions — similar to how a human drives without explicit intermediate representations.

8. Vehicle Control

The planned trajectory is executed by the control layer, which commands steering, throttle, and brakes. A PID controller (or cascaded PID) is common for lateral (steering) and longitudinal (speed) control. MPC provides better performance by predicting actuator dynamics and respecting constraints (jerk, tyre slip).

The commands pass through the car's drive-by-wire system to electric power steering, electronic throttle, and ABS/ESC. Latency and actuator response times must be modelled in the controller or the trajectory will lag.

9. Remaining Challenges

Long-tail edge cases: The distribution of driving situations is enormous. Rare events (wrong-way drivers, flooding, unusual cargo) are under-represented in training data but must be handled safely.
V2X communication: Vehicle-to-everything (other cars, traffic lights, pedestrians) can share intent — but standardised deployment is years away.
Adverse weather: Heavy rain, snow, and direct sun glare degrade sensors in correlated ways — all sensors fail together at the worst moments.
Regulatory and liability frameworks: Certifying safety statistically for a technology without a human override is legally unprecedented.

🚗 Open Traffic Flow Simulation →