How Autonomous Vehicles Work
A self-driving car must perceive the world, understand it, predict what other agents will do next, and calculate a safe path — all within 100 milliseconds. Here is the engineering pipeline that makes this happen.
1. SAE Autonomy Levels
2. Sensors: Eyes of the Car
| Sensor | Range | Strengths | Weaknesses |
|---|---|---|---|
| LiDAR | 0.1–200 m | Precise 3D point cloud, works in dark, no motion blur | Expensive, rain/snow scattering, no colour |
| Radar | 0.5–300 m | Works in fog/rain, directly measures velocity (Doppler), cheap | Low resolution, reflective clutter |
| Camera | 0.1–150 m | Rich semantic info, works at traffic light range, cheap | No depth, sensitive to lighting and glare |
| Ultrasonic | 0.1–8 m | Cheap, very reliable near-field detection | Very short range only |
| High-precision GPS | Global | Absolute position (cm-level with RTK) | No signal in tunnels/dense urban canyons, 100 ms latency |
3. Sensor Fusion
No single sensor is reliable in all conditions. Sensor fusion combines all inputs into a coherent world model. Common approaches:
- Kalman Filtering / Extended Kalman Filter (EKF): Optimal linear estimator that maintains a probability distribution over the vehicle's position and velocity, combining motion model predictions with noisy sensor updates.
- Particle Filter: Non-parametric Bayesian filter — represents distributions with samples. Better for non-linear, non-Gaussian uncertainty. Used for localisation on LiDAR maps.
- Deep learning fusion: Modern systems (Tesla FSD, Waymo 5) fuse sensors directly in the network — raw sensor data in, 3D scene representation out.
4. Localisation and SLAM
GPS alone is insufficient — 3 m accuracy error is dangerous in a traffic lane. AV systems achieve centimetre-level accuracy by matching real-time sensor data against a High-Definition (HD) map: LiDAR point clouds are matched to a pre-built map using algorithms like ICP (Iterative Closest Point) or LOAM (LiDAR Odometry and Mapping).
In unmapped or changed environments, SLAM (Simultaneous Localisation and Mapping) builds the map and localises the vehicle at the same time — a chicken-and-egg problem solved with probabilistic graph optimisation (pose graph SLAM, iSAM2).
5. Perception: Seeing Objects
Perception converts raw sensor data into a list of detected objects with class, position, size, and velocity. Current approaches:
- 3D Object Detection from LiDAR: PointPillars, VoxelNet — voxelise the point cloud and apply 3D convolutions.
- Camera-based detection: BEVFusion, BEV-Former — project images into a Bird's-Eye View (BEV) feature grid using transformers. Enables depth estimation from monocular cameras.
- Segmentation: Pixel- and point-level classification. Necessary for drivable area detection, lane recognition.
- Occupancy grids: Probabilistic map of which 3D voxels are occupied by any object — safer than explicit object lists for handling "unknown unknown" objects.
6. Prediction: What Will Others Do?
Knowing where other agents are is not enough — we need to know where they will be in the next 5–10 seconds. This is motion prediction.
Classical approaches used kinematic models (constant velocity, constant turn rate). Modern AV systems use transformer-based prediction (Waymo Motion, MTR):
- Input: past trajectories of all agents + map topology (lane geometry, traffic lights).
- Output: K multimodal future trajectories with probability scores for each agent.
Multimodal output is critical — a pedestrian might either cross the road or turn right. The planner must handle all plausible futures.
7. Motion Planning
Motion planning finds a collision-free, comfortable trajectory from the current state to a goal. Key layers:
- Route planning: High-level navigation (A* over road graph).
- Behavioural planning: Decides when to change lanes, yield, merge. Often rule-based or learned policies.
-
Trajectory planning: Generates a smooth, dynamically
feasible path at ≤100 ms. Methods:
- Polynomial spirals / spline optimisation (Werling et al. Frenet frame planner)
- Rapidly-exploring Random Trees (RRT*)
- Model Predictive Control (MPC) — optimises over a short horizon considering dynamics
8. Vehicle Control
The planned trajectory is executed by the control layer, which commands steering, throttle, and brakes. A PID controller (or cascaded PID) is common for lateral (steering) and longitudinal (speed) control. MPC provides better performance by predicting actuator dynamics and respecting constraints (jerk, tyre slip).
The commands pass through the car's drive-by-wire system to electric power steering, electronic throttle, and ABS/ESC. Latency and actuator response times must be modelled in the controller or the trajectory will lag.
9. Remaining Challenges
- Long-tail edge cases: The distribution of driving situations is enormous. Rare events (wrong-way drivers, flooding, unusual cargo) are under-represented in training data but must be handled safely.
- V2X communication: Vehicle-to-everything (other cars, traffic lights, pedestrians) can share intent — but standardised deployment is years away.
- Adverse weather: Heavy rain, snow, and direct sun glare degrade sensors in correlated ways — all sensors fail together at the worst moments.
- Regulatory and liability frameworks: Certifying safety statistically for a technology without a human override is legally unprecedented.