SLAM: Mapping and Localization
Simultaneous Localization and Mapping (SLAM) solves a deceptively circular problem: a robot dropped into an unknown environment must build a map of its surroundings while simultaneously using that very map to figure out where it is. Neither task is possible without the other, yet decades of probabilistic robotics research — from the Extended Kalman Filter to particle filters to modern graph optimisation — have turned this "chicken-and-egg" problem into the backbone of self-driving cars, warehouse robots, drones and AR headsets. This article walks through the mathematics of state estimation, the major SLAM families, loop closure, and a working occupancy-grid implementation in JavaScript.
1. The SLAM Problem
Formally, SLAM estimates the joint posterior over the robot's trajectory and the map, given noisy control inputs and noisy observations of landmarks:
Why is this hard? Odometry drifts — wheel slip and integration error accumulate unboundedly over distance. If the robot trusted odometry alone, its estimated position would diverge from reality the longer it drives. Landmarks observed from the map can correct this drift, but the landmark positions are only known relative to where the robot thinks it has been. Errors in localisation corrupt the map; errors in the map corrupt localisation. SLAM algorithms are the class of estimators that untangle this coupled uncertainty in a statistically consistent way.
2. Motion and Sensor Models
Every SLAM algorithm needs two probabilistic models. The motion model predicts how the pose evolves given a control input, and the sensor model predicts what observation to expect given a pose and a map.
R and Q are covariance matrices capturing how noisy the motion and sensor are, respectively. A robot with a precise laser rangefinder but slippy wheels has small Q and large R. Every SLAM back-end is, at its core, a way of fusing these two noisy information sources into a single best estimate — the machinery that differs is how the fusion is computed.
3. EKF-SLAM
The Extended Kalman Filter augments the classic Kalman filter state vector to include both the robot pose and every landmark position, then linearises the (generally nonlinear) motion and sensor models around the current estimate using a first-order Taylor expansion (the Jacobian).
The key insight is that the full covariance matrix Σ is dense — correcting the robot pose from one landmark observation also shifts every other landmark's estimated position via their shared off-diagonal correlation terms. This is exactly what couples mapping and localisation into one estimator. The cost is quadratic in the number of landmarks (O(N²) per update, since Σ is N×N), which limits classic EKF-SLAM to a few hundred landmarks before it becomes computationally impractical.
4. FastSLAM and Particle Filters
FastSLAM (Montemerlo et al., 2002) exploits a factorisation: given the robot's trajectory, landmark estimates become conditionally independent of each other. This lets FastSLAM represent the trajectory with a particle filter (many discrete hypotheses of "where has the robot been") while giving each particle its own independent set of small EKFs, one per landmark.
Because each particle carries a full, independent hypothesis of the map, FastSLAM naturally handles ambiguous or multi-modal situations (e.g. a symmetric corridor) that a single-hypothesis EKF cannot represent. The trade-off is particle depletion: after many resampling steps, all particles can collapse to descend from a single ancestor, losing trajectory diversity — especially damaging right before a loop closure.
5. Graph-Based SLAM
Modern SLAM systems (GTSAM, g2o, Cartographer, ORB-SLAM's back-end) formulate the problem as a pose graph: nodes are robot poses (and optionally landmarks), and edges are constraints derived from odometry or from matched observations between non-consecutive poses. Solving SLAM becomes a nonlinear least-squares optimisation over the whole graph.
The crucial property exploited here is sparsity: each edge only connects two nearby poses, so the information matrix JᵀΩJ is sparse and can be factored efficiently even for graphs with hundreds of thousands of nodes — this is why graph-based (full) SLAM scales far better than the dense O(N²) EKF approach.
6. Loop Closure and Data Association
As a robot explores, odometry drift causes its estimated trajectory to slowly diverge from the true path. Loop closure is the moment the robot recognises it has returned to a previously visited place — this single new edge in the pose graph lets the optimiser "snap" the accumulated drift back into a globally consistent map.
Place recognition
Bag-of-visual-words (DBoW2) or learned descriptors (NetVLAD) compare the current view against a database of past keyframes to propose loop-closure candidates.
Geometric verification
A candidate match is confirmed by aligning point clouds or feature correspondences (ICP, RANSAC + PnP) and checking that the resulting transform is geometrically consistent.
Data association
Deciding which observed landmark corresponds to which mapped landmark. Wrong associations ("perceptual aliasing") are catastrophic — they inject false constraints that corrupt the whole map.
Pose graph relaxation
Once a loop-closure edge is added, re-running the least-squares optimiser distributes the accumulated error smoothly across every pose in the loop, not just the endpoints.
7. Occupancy Grid Mapping
For mobile robots equipped with range sensors (LiDAR, sonar), maps are often represented as an occupancy grid — a regular array of cells, each holding the probability that it is occupied by an obstacle. Cells are updated independently using the log-odds form of Bayes' rule for numerical stability.
Ray tracing from the robot pose to each range measurement (e.g. via Bresenham's line algorithm) marks free space along the beam and occupied space at its endpoint. Log-odds accumulation means a cell repeatedly observed as free becomes strongly confident of being free, while a single spurious reading cannot flip a well-established cell's state — a simple but effective robustness property.
8. JavaScript Implementation
// Minimal 2D occupancy-grid SLAM: log-odds update + Bresenham ray tracing
class OccupancyGrid {
constructor(width, height, cellSize) {
this.width = width; this.height = height; this.cellSize = cellSize;
this.logOdds = new Float32Array(width * height); // 0 = unknown (p=0.5)
this.L_OCC = 0.85; this.L_FREE = -0.4; this.L_MIN = -5; this.L_MAX = 5;
}
worldToCell(x, y) {
return { cx: Math.floor(x / this.cellSize), cy: Math.floor(y / this.cellSize) };
}
idx(cx, cy) { return cy * this.width + cx; }
updateCell(cx, cy, logOddsDelta) {
if (cx < 0 || cy < 0 || cx >= this.width || cy >= this.height) return;
const i = this.idx(cx, cy);
this.logOdds[i] = Math.max(this.L_MIN, Math.min(this.L_MAX, this.logOdds[i] + logOddsDelta));
}
// Bresenham's line algorithm: mark free cells along the ray, occupied at the hit
integrateScan(pose, ranges, maxRange, angleStep) {
for (let i = 0; i < ranges.length; i++) {
const angle = pose.theta + i * angleStep;
const r = Math.min(ranges[i], maxRange);
const hitX = pose.x + r * Math.cos(angle);
const hitY = pose.y + r * Math.sin(angle);
const { cx: x0, cy: y0 } = this.worldToCell(pose.x, pose.y);
const { cx: x1, cy: y1 } = this.worldToCell(hitX, hitY);
this.traceLine(x0, y0, x1, y1, ranges[i] < maxRange);
}
}
traceLine(x0, y0, x1, y1, hitOccupied) {
let dx = Math.abs(x1 - x0), dy = -Math.abs(y1 - y0);
let sx = x0 < x1 ? 1 : -1, sy = y0 < y1 ? 1 : -1;
let err = dx + dy, x = x0, y = y0;
while (true) {
const isEndpoint = (x === x1 && y === y1);
this.updateCell(x, y, isEndpoint && hitOccupied ? this.L_OCC : this.L_FREE);
if (isEndpoint) break;
const e2 = 2 * err;
if (e2 >= dy) { err += dy; x += sx; }
if (e2 <= dx) { err += dx; y += sy; }
}
}
probability(cx, cy) {
const l = this.logOdds[this.idx(cx, cy)];
return 1 - 1 / (1 + Math.exp(l));
}
}
// Usage: build a 200x200 grid at 0.1m resolution, integrate one LiDAR scan
const grid = new OccupancyGrid(200, 200, 0.1);
const pose = { x: 10, y: 10, theta: 0 };
const ranges = [3.2, 3.1, 3.0, 5.8, 5.9]; // simulated LiDAR beam ranges (m)
grid.integrateScan(pose, ranges, 8.0, Math.PI / 180);
console.log(grid.probability(130, 100)); // p(occupied) for a cell near a scan hit
9. Modern SLAM Systems
ORB-SLAM3
Feature-based visual(-inertial) SLAM using ORB keypoints, a three-thread architecture (tracking, local mapping, loop closing) and bundle adjustment for map refinement. Supports monocular, stereo and RGB-D.
Cartographer
Google's LiDAR-based SLAM: local submaps built via scan matching, global pose graph optimisation for loop closure. Widely used in 2D and 3D warehouse and service robots.
LOAM / LIO-SAM
LiDAR-inertial odometry that extracts edge and planar feature points from point clouds, fuses IMU preintegration with scan matching in a tightly-coupled factor graph.
Neural / NeRF-SLAM
Recent research replaces explicit grids with implicit neural scene representations (NeRF, 3D Gaussian Splatting) optimised online alongside camera pose — dense, photorealistic maps from a single moving camera.
Despite decades of progress, SLAM remains an active research area: robust operation in dynamic environments (moving people, changing furniture), long-term map maintenance, and multi-robot collaborative SLAM are all still open engineering challenges. But the core recipe — a motion model, a sensor model, and a principled way to fuse their uncertainty — underlies every solution, from a $30 robot vacuum to a self-driving car's multi-sensor stack.