N-Body Barnes-Hut Algorithm from Scratch
The brute-force N-body gravity problem is O(N²) — 10 000 particles requires 100 million force calculations per frame. Barnes-Hut reduces this to O(N log N) using a spatial octree that approximates distant groups of particles as a single mass. This tutorial builds it from scratch in JavaScript.
1. The N-Body Problem and Complexity
In an N-body gravity simulation every particle attracts every other particle. The naive algorithm computes N(N−1)/2 pair interactions per step:
direction: unit vector from i to j
Naive complexity: O(N²) per time step
| N particles | Brute force ops/step | Barnes-Hut ops/step | Speedup |
|---|---|---|---|
| 1 000 | 500 000 | ~10 000 | ~50× |
| 10 000 | 50 000 000 | ~130 000 | ~380× |
| 100 000 | 5 000 000 000 | ~1 700 000 | ~2900× |
2. Quadtree / Octree Structure
Barnes-Hut partitions space recursively. In 2D this is a quadtree (4 children); in 3D, an octree (8 children). Each node represents a square (or cube) region and either:
- Is a leaf containing exactly one particle, or
- Is an internal node whose children cover the 4 (or 8) sub-regions.
// 2D Quadtree node class QNode { constructor(cx, cy, size) { this.cx = cx; // cell center x this.cy = cy; // cell center y this.size = size; // cell half-size this.mass = 0; // total mass in this cell this.comx = 0; // center-of-mass x this.comy = 0; // center-of-mass y this.particle = null; // non-null if leaf this.children = null; // null if leaf; [nw, ne, sw, se] if internal } }
3. Tree Insertion
function insert(node, p) { if (node.mass === 0) { // Empty node — store particle as leaf node.particle = p; node.mass = p.mass; node.comx = p.x; node.comy = p.y; return; } if (node.children === null) { // Was a leaf — subdivide and re-insert existing particle subdivide(node); insertToChild(node, node.particle); node.particle = null; } // Insert new particle into appropriate child insertToChild(node, p); // Update cumulative center of mass node.mass += p.mass; node.comx = (node.comx * (node.mass - p.mass) + p.x * p.mass) / node.mass; node.comy = (node.comy * (node.mass - p.mass) + p.y * p.mass) / node.mass; } function subdivide(node) { const h = node.size / 2; node.children = [ new QNode(node.cx - h, node.cy + h, h), // NW new QNode(node.cx + h, node.cy + h, h), // NE new QNode(node.cx - h, node.cy - h, h), // SW new QNode(node.cx + h, node.cy - h, h), // SE ]; } function insertToChild(node, p) { const isEast = p.x >= node.cx; const isNorth = p.y >= node.cy; const idx = (isNorth ? 0 : 2) + (isEast ? 1 : 0); insert(node.children[idx], p); }
4. Center of Mass Computation
The center-of-mass position and total mass stored in each internal node are updated during insertion (see above). They represent the aggregate gravitational effect of all particles in that cell:
x_com = (1/M_cell) · Σ (m_i · x_i)
y_com = (1/M_cell) · Σ (m_i · y_i)
When we traverse the tree during force calculation, a distant cell is
treated as a single particle with mass M_cell located at
(x_com, y_com). This is the multipole approximation at
monopole order — higher-order multipole methods (FMM) extend this with
quadrupole and octupole terms for greater accuracy.
5. The Theta Criterion
How far is "far enough" to use the approximation? The theta criterion (opening angle criterion) decides:
s = cell side length
d = distance from particle to cell center
θ = accuracy parameter (typical: 0.5–1.0)
- θ = 0: Never approximate — equivalent to brute force O(N²).
- θ = 0.5: Good balance. Error ~1% for most galaxy-scale simulations.
- θ = 1.0: Faster but less accurate.
- θ = ∞: Approximate everything — use only the root's center of mass (terrible).
6. Force Traversal
const G = 6.674e-11; const SOFTENING = 0.1; // prevents F → ∞ at r → 0 const THETA = 0.5; function calcForce(node, p, out) { if (node.mass === 0) return; const dx = node.comx - p.x; const dy = node.comy - p.y; const distSq = dx*dx + dy*dy + SOFTENING*SOFTENING; const dist = Math.sqrt(distSq); if (node.children === null || node.size / dist < THETA) { // Leaf OR cell is far enough — use approximation if (node.particle === p) return; // skip self const f = G * p.mass * node.mass / distSq; out.fx += f * dx / dist; out.fy += f * dy / dist; } else { // Cell too close — recurse into children for (const child of node.children) { if (child && child.mass > 0) calcForce(child, p, out); } } }
7. Full Integration Loop
function step(particles, dt) { // 1. Compute bounding box of all particles let minX=Infinity, maxX=-Infinity, minY=Infinity, maxY=-Infinity; for (const p of particles) { if (p.x < minX) minX=p.x; if (p.x > maxX) maxX=p.x; if (p.y < minY) minY=p.y; if (p.y > maxY) maxY=p.y; } const cx = (minX + maxX) / 2; const cy = (minY + maxY) / 2; const size = Math.max(maxX - minX, maxY - minY) / 2 * 1.01; // 2. Build quadtree const root = new QNode(cx, cy, size); for (const p of particles) insert(root, p); // 3. For each particle compute force and integrate (leapfrog) for (const p of particles) { const force = { fx: 0, fy: 0 }; calcForce(root, p, force); // Velocity Verlet / Leapfrog integration p.vx += (force.fx / p.mass) * dt; p.vy += (force.fy / p.mass) * dt; p.x += p.vx * dt; p.y += p.vy * dt; } }
QNode objects with an
object pool to avoid garbage collection pauses on large simulations.
8. Optimisation Tips
-
Object pooling: Allocate a flat array of
MAX_NODESpre-builtQNodeobjects and reset them each step rather than allocating millions of objects per second. - Flat array tree: Store the tree as a flat typed array (SoA — struct of arrays) with integer indices for children. Cache-friendly; avoids pointer chasing on large N.
- Web Workers: Offload force computation to a dedicated Worker thread (postMessage / SharedArrayBuffer). The render loop on the main thread reads the results without blocking UI.
- GPU via WebGL compute: WebGPU compute shaders can run BH traversal entirely on the GPU. For WebGL 2, a flattened tree stored in a texture sampled by a compute-proxy vertex shader is possible, though complex.
- Adaptive theta: Increase θ for particles that are far from all cluster centers (where accuracy matters less), and decrease θ in dense regions.
- Softening: A small softening length ε (added to denominator as √(r² + ε²)) prevents the singularity at r=0 and avoids unrealistically large accelerations between close particles. Typical value: 1–5% of mean interparticle separation.
With these optimisations, a single-threaded JavaScript implementation can handle ~50 000 particles at interactive frame rates, and a Web Worker implementation can push to ~200 000.