🚗 Transport · Robotics
📅 March 2026 ⏱ ~10 min read 🟡 Intermediate

How Autonomous Vehicles Work

A self-driving car must perceive the world, understand it, predict what other agents will do next, and calculate a safe path — all within 100 milliseconds. Here is the engineering pipeline that makes this happen.

1. SAE Autonomy Levels

L0
No automationHuman controls everything. Warnings only (lane departure alerts).
L1
Driver assistanceSingle system: adaptive cruise OR lane centring (not both). Human monitors.
L2
Partial automationBoth steering and speed (e.g. Tesla Autopilot, GM Super Cruise). Human must monitor and be ready to take over.
L3
Conditional automationSystem drives; human may engage other activities but must be available on request. Honda Legend, Mercedes-Benz Drive Pilot (approved in certain geofences).
L4
High automationNo human required within defined operational design domain (e.g. Waymo One robotaxi in Phoenix). Cannot handle all conditions.
L5
Full self-drivingPerforms all tasks anywhere, any conditions. Not yet commercially deployed as of 2025.

2. Sensors: Eyes of the Car

Sensor Range Strengths Weaknesses
LiDAR 0.1–200 m Precise 3D point cloud, works in dark, no motion blur Expensive, rain/snow scattering, no colour
Radar 0.5–300 m Works in fog/rain, directly measures velocity (Doppler), cheap Low resolution, reflective clutter
Camera 0.1–150 m Rich semantic info, works at traffic light range, cheap No depth, sensitive to lighting and glare
Ultrasonic 0.1–8 m Cheap, very reliable near-field detection Very short range only
High-precision GPS Global Absolute position (cm-level with RTK) No signal in tunnels/dense urban canyons, 100 ms latency
Tesla vs. Waymo philosophy: Tesla uses camera-only ("Tesla Vision") — cheaper and scales better. Waymo uses LiDAR + radar + cameras. Both approaches have tradeoffs; the debate is ongoing in the industry.

3. Sensor Fusion

No single sensor is reliable in all conditions. Sensor fusion combines all inputs into a coherent world model. Common approaches:

4. Localisation and SLAM

GPS alone is insufficient — 3 m accuracy error is dangerous in a traffic lane. AV systems achieve centimetre-level accuracy by matching real-time sensor data against a High-Definition (HD) map: LiDAR point clouds are matched to a pre-built map using algorithms like ICP (Iterative Closest Point) or LOAM (LiDAR Odometry and Mapping).

In unmapped or changed environments, SLAM (Simultaneous Localisation and Mapping) builds the map and localises the vehicle at the same time — a chicken-and-egg problem solved with probabilistic graph optimisation (pose graph SLAM, iSAM2).

5. Perception: Seeing Objects

Perception converts raw sensor data into a list of detected objects with class, position, size, and velocity. Current approaches:

6. Prediction: What Will Others Do?

Knowing where other agents are is not enough — we need to know where they will be in the next 5–10 seconds. This is motion prediction.

Classical approaches used kinematic models (constant velocity, constant turn rate). Modern AV systems use transformer-based prediction (Waymo Motion, MTR):

Multimodal output is critical — a pedestrian might either cross the road or turn right. The planner must handle all plausible futures.

7. Motion Planning

Motion planning finds a collision-free, comfortable trajectory from the current state to a goal. Key layers:

End-to-end learning: Tesla FSD v12+ and Waymo are increasingly replacing the classical modular pipeline with large neural networks that go directly from sensor inputs to control actions — similar to how a human drives without explicit intermediate representations.

8. Vehicle Control

The planned trajectory is executed by the control layer, which commands steering, throttle, and brakes. A PID controller (or cascaded PID) is common for lateral (steering) and longitudinal (speed) control. MPC provides better performance by predicting actuator dynamics and respecting constraints (jerk, tyre slip).

The commands pass through the car's drive-by-wire system to electric power steering, electronic throttle, and ABS/ESC. Latency and actuator response times must be modelled in the controller or the trajectory will lag.

9. Remaining Challenges

🚗 Open Traffic Flow Simulation →