SLAM: Mapping and Localization

3D Simulations

Article

Robotics · Algorithms · ⏱ ~15 min read · Last updated: 3 July 2026

Simultaneous Localization and Mapping (SLAM) solves a deceptively circular problem: a robot dropped into an unknown environment must build a map of its surroundings while simultaneously using that very map to figure out where it is. Neither task is possible without the other, yet decades of probabilistic robotics research — from the Extended Kalman Filter to particle filters to modern graph optimisation — have turned this "chicken-and-egg" problem into the backbone of self-driving cars, warehouse robots, drones and AR headsets. This article walks through the mathematics of state estimation, the major SLAM families, loop closure, and a working occupancy-grid implementation in JavaScript.

Written by MySimulator Team · Reviewed by MySimulator Editorial Review

1. The SLAM Problem

Formally, SLAM estimates the joint posterior over the robot's trajectory and the map, given noisy control inputs and noisy observations of landmarks:

p(x_1:t, m | z_1:t, u_1:t) x_t = robot pose at time t (position + heading) m = map (landmark positions or occupancy grid) z_1:t = sensor observations up to time t (LiDAR, camera, sonar) u_1:t = control/odometry inputs up to time t (wheel encoders, IMU)

Why is this hard? Odometry drifts — wheel slip and integration error accumulate unboundedly over distance. If the robot trusted odometry alone, its estimated position would diverge from reality the longer it drives. Landmarks observed from the map can correct this drift, but the landmark positions are only known relative to where the robot thinks it has been. Errors in localisation corrupt the map; errors in the map corrupt localisation. SLAM algorithms are the class of estimators that untangle this coupled uncertainty in a statistically consistent way.

Full SLAM vs. Online SLAM: Full SLAM estimates the entire trajectory x_1:t jointly with the map (solved via batch optimisation — this is what graph-based SLAM does). Online SLAM estimates only the current pose x_t recursively (solved via filtering — this is what EKF-SLAM and particle filters do). Full SLAM is generally more accurate; online SLAM is cheaper per step.

2. Motion and Sensor Models

Every SLAM algorithm needs two probabilistic models. The motion model predicts how the pose evolves given a control input, and the sensor model predicts what observation to expect given a pose and a map.

Motion model (differential-drive / odometry): x_t = f(x_t-1, u_t) + ε, ε ~ N(0, R) x_t = [x, y, θ]ᵀ (2D pose: position + heading) Velocity model: x' = x - (v/ω)sinθ + (v/ω)sin(θ+ωΔt) y' = y + (v/ω)cosθ - (v/ω)cos(θ+ωΔt) θ' = θ + ωΔt Sensor model (range-bearing to landmark i): z_t = h(x_t, m_i) + δ, δ ~ N(0, Q) r = \sqrt((m_ix - x)² + (m_iy - y)²) (range) φ = atan2(m_iy - y, m_ix - x) - θ (bearing)

R and Q are covariance matrices capturing how noisy the motion and sensor are, respectively. A robot with a precise laser rangefinder but slippy wheels has small Q and large R. Every SLAM back-end is, at its core, a way of fusing these two noisy information sources into a single best estimate — the machinery that differs is how the fusion is computed.

3. EKF-SLAM

The Extended Kalman Filter augments the classic Kalman filter state vector to include both the robot pose and every landmark position, then linearises the (generally nonlinear) motion and sensor models around the current estimate using a first-order Taylor expansion (the Jacobian).

State vector: y = [x, y, θ, m_1x, m_1y, m_2x, m_2y, \dots, m_Nx, m_Ny]ᵀ Prediction step: ŷ_t = f(ŷ_t-1, u_t) (landmarks unchanged — only pose predicted) Σ_t = F_x Σ_t-1 F_xᵀ + F_u R F_uᵀ (F_x, F_u = Jacobians of f) Correction step (per observed landmark i): K = Σ_t H^T (H Σ_t Hᵀ + Q)⁻¹ (Kalman gain) ŷ_t \leftarrow ŷ_t + K (z_t - h(ŷ_t)) (innovation-weighted update) Σ_t \leftarrow (I - K H) Σ_t (H = Jacobian of h w.r.t. the full state y)

The key insight is that the full covariance matrix Σ is dense — correcting the robot pose from one landmark observation also shifts every other landmark's estimated position via their shared off-diagonal correlation terms. This is exactly what couples mapping and localisation into one estimator. The cost is quadratic in the number of landmarks (O(N²) per update, since Σ is N×N), which limits classic EKF-SLAM to a few hundred landmarks before it becomes computationally impractical.

4. FastSLAM and Particle Filters

FastSLAM (Montemerlo et al., 2002) exploits a factorisation: given the robot's trajectory, landmark estimates become conditionally independent of each other. This lets FastSLAM represent the trajectory with a particle filter (many discrete hypotheses of "where has the robot been") while giving each particle its own independent set of small EKFs, one per landmark.

Rao-Blackwellised factorisation: p(x_1:t, m | z, u) = p(x_1:t | z, u) \cdot Π p(m_i | x_1:t, z) Per-particle steps: 1. SAMPLE: draw x_t^[k] ~ p(x_t | x_t-1^[k], u_t) for each particle k 2. WEIGHT: w^[k] \propto p(z_t | x_t^[k], m^[k]) (likelihood of observation given particle's map) 3. UPDATE: correct each landmark's 2\times2 EKF within particle k using z_t 4. RESAMPLE: draw N particles with replacement, probability \propto w^[k] Complexity: O(N log M) — N particles, M landmarks per particle, using tree-structured maps

Because each particle carries a full, independent hypothesis of the map, FastSLAM naturally handles ambiguous or multi-modal situations (e.g. a symmetric corridor) that a single-hypothesis EKF cannot represent. The trade-off is particle depletion: after many resampling steps, all particles can collapse to descend from a single ancestor, losing trajectory diversity — especially damaging right before a loop closure.

5. Graph-Based SLAM

Modern SLAM systems (GTSAM, g2o, Cartographer, ORB-SLAM's back-end) formulate the problem as a pose graph: nodes are robot poses (and optionally landmarks), and edges are constraints derived from odometry or from matched observations between non-consecutive poses. Solving SLAM becomes a nonlinear least-squares optimisation over the whole graph.

Objective: x* = argmin_x Σ_ij ||e_ij(x_i, x_j)||²_Ωij e_ij = z_ij - ĥ(x_i, x_j) (error between measured and predicted relative transform) Ω_ij = information matrix (inverse covariance) — weights confident edges more Solved via Gauss-Newton / Levenberg-Marquardt: linearise e_ij around current estimate \to J (Jacobian) build normal equations: (JᵀΩJ) Δx = -JᵀΩe solve sparse linear system for Δx (Cholesky / sparse QR) update: x \leftarrow x ⊞ Δx, repeat until convergence

The crucial property exploited here is sparsity: each edge only connects two nearby poses, so the information matrix JᵀΩJ is sparse and can be factored efficiently even for graphs with hundreds of thousands of nodes — this is why graph-based (full) SLAM scales far better than the dense O(N²) EKF approach.

6. Loop Closure and Data Association

As a robot explores, odometry drift causes its estimated trajectory to slowly diverge from the true path. Loop closure is the moment the robot recognises it has returned to a previously visited place — this single new edge in the pose graph lets the optimiser "snap" the accumulated drift back into a globally consistent map.

Place recognition

Bag-of-visual-words (DBoW2) or learned descriptors (NetVLAD) compare the current view against a database of past keyframes to propose loop-closure candidates.

Geometric verification

A candidate match is confirmed by aligning point clouds or feature correspondences (ICP, RANSAC + PnP) and checking that the resulting transform is geometrically consistent.

Data association

Deciding which observed landmark corresponds to which mapped landmark. Wrong associations ("perceptual aliasing") are catastrophic — they inject false constraints that corrupt the whole map.

Pose graph relaxation

Once a loop-closure edge is added, re-running the least-squares optimiser distributes the accumulated error smoothly across every pose in the loop, not just the endpoints.

7. Occupancy Grid Mapping

For mobile robots equipped with range sensors (LiDAR, sonar), maps are often represented as an occupancy grid — a regular array of cells, each holding the probability that it is occupied by an obstacle. Cells are updated independently using the log-odds form of Bayes' rule for numerical stability.

Log-odds representation: l_t(c) = log( p(c) / (1 − p(c)) ) Recursive Bayesian update per ray-traced cell: l_t(c) = l_t-1(c) + log( p(c | z_t) / (1 − p(c | z_t)) ) − l_0(c) Inverse sensor model (simplified): cell is FREE along the ray up to the hit → subtract log-odds (e.g. −0.4) cell is OCCUPIED at the range reading → add log-odds (e.g. +0.85) cell beyond the range reading → unknown, no update Recover probability: p(c) = 1 − 1 / (1 + e^{l_t(c)})

Ray tracing from the robot pose to each range measurement (e.g. via Bresenham's line algorithm) marks free space along the beam and occupied space at its endpoint. Log-odds accumulation means a cell repeatedly observed as free becomes strongly confident of being free, while a single spurious reading cannot flip a well-established cell's state — a simple but effective robustness property.

8. JavaScript Implementation

// Minimal 2D occupancy-grid SLAM: log-odds update + Bresenham ray tracing
class OccupancyGrid {
  constructor(width, height, cellSize) {
    this.width = width; this.height = height; this.cellSize = cellSize;
    this.logOdds = new Float32Array(width * height); // 0 = unknown (p=0.5)
    this.L_OCC = 0.85; this.L_FREE = -0.4; this.L_MIN = -5; this.L_MAX = 5;
  }

  worldToCell(x, y) {
    return { cx: Math.floor(x / this.cellSize), cy: Math.floor(y / this.cellSize) };
  }

  idx(cx, cy) { return cy * this.width + cx; }

  updateCell(cx, cy, logOddsDelta) {
    if (cx < 0 || cy < 0 || cx >= this.width || cy >= this.height) return;
    const i = this.idx(cx, cy);
    this.logOdds[i] = Math.max(this.L_MIN, Math.min(this.L_MAX, this.logOdds[i] + logOddsDelta));
  }

  // Bresenham's line algorithm: mark free cells along the ray, occupied at the hit
  integrateScan(pose, ranges, maxRange, angleStep) {
    for (let i = 0; i < ranges.length; i++) {
      const angle = pose.theta + i * angleStep;
      const r = Math.min(ranges[i], maxRange);
      const hitX = pose.x + r * Math.cos(angle);
      const hitY = pose.y + r * Math.sin(angle);
      const { cx: x0, cy: y0 } = this.worldToCell(pose.x, pose.y);
      const { cx: x1, cy: y1 } = this.worldToCell(hitX, hitY);
      this.traceLine(x0, y0, x1, y1, ranges[i] < maxRange);
    }
  }

  traceLine(x0, y0, x1, y1, hitOccupied) {
    let dx = Math.abs(x1 - x0), dy = -Math.abs(y1 - y0);
    let sx = x0 < x1 ? 1 : -1, sy = y0 < y1 ? 1 : -1;
    let err = dx + dy, x = x0, y = y0;
    while (true) {
      const isEndpoint = (x === x1 && y === y1);
      this.updateCell(x, y, isEndpoint && hitOccupied ? this.L_OCC : this.L_FREE);
      if (isEndpoint) break;
      const e2 = 2 * err;
      if (e2 >= dy) { err += dy; x += sx; }
      if (e2 <= dx) { err += dx; y += sy; }
    }
  }

  probability(cx, cy) {
    const l = this.logOdds[this.idx(cx, cy)];
    return 1 - 1 / (1 + Math.exp(l));
  }
}

// Usage: build a 200x200 grid at 0.1m resolution, integrate one LiDAR scan
const grid = new OccupancyGrid(200, 200, 0.1);
const pose = { x: 10, y: 10, theta: 0 };
const ranges = [3.2, 3.1, 3.0, 5.8, 5.9]; // simulated LiDAR beam ranges (m)
grid.integrateScan(pose, ranges, 8.0, Math.PI / 180);
console.log(grid.probability(130, 100)); // p(occupied) for a cell near a scan hit

9. Modern SLAM Systems

ORB-SLAM3

Feature-based visual(-inertial) SLAM using ORB keypoints, a three-thread architecture (tracking, local mapping, loop closing) and bundle adjustment for map refinement. Supports monocular, stereo and RGB-D.

Cartographer

Google's LiDAR-based SLAM: local submaps built via scan matching, global pose graph optimisation for loop closure. Widely used in 2D and 3D warehouse and service robots.

LOAM / LIO-SAM

LiDAR-inertial odometry that extracts edge and planar feature points from point clouds, fuses IMU preintegration with scan matching in a tightly-coupled factor graph.

Neural / NeRF-SLAM

Recent research replaces explicit grids with implicit neural scene representations (NeRF, 3D Gaussian Splatting) optimised online alongside camera pose — dense, photorealistic maps from a single moving camera.

Despite decades of progress, SLAM remains an active research area: robust operation in dynamic environments (moving people, changing furniture), long-term map maintenance, and multi-robot collaborative SLAM are all still open engineering challenges. But the core recipe — a motion model, a sensor model, and a principled way to fuse their uncertainty — underlies every solution, from a $30 robot vacuum to a self-driving car's multi-sensor stack.

🗺️ Open Pathfinding →