Devlog #29 — 60 FPS on Every Device: Performance Engineering for Browser Simulations

Platform Numbers, Q4 2027

345+

Simulations live

80+

Wave 10 Content

This devlog closes Wave 10, which shipped two deep-dive posts alongside this one:

Post	Topic	Date
🔥 Spotlight #20	Thermodynamics & Heat Transfer	Nov 2027
🧮 Learning #19	Algorithms & Computational Complexity	Nov 2027
⚙️ Devlog #29	60 FPS on Every Device (this post)	Dec 2027

The Problem: Frame Budget Reality

A browser simulation runs inside requestAnimationFrame, which fires at 60 Hz on most displays — giving you exactly 16.67 ms per frame. Each frame must handle physics, collision detection, and rendering. On a modern desktop with a dedicated GPU, this budget is comfortable. On a 2019-era mid-range phone, it is tight. On a low-end budget device with a slow JavaScript engine, it is a constraint that punishes any algorithmic waste.

Before the Q4 2027 optimisation sprint, 47 of our simulations dropped below 55 FPS on our reference low-end device (Snapdragon 695, 4 GB RAM, Chrome 119). After the sprint, all 345 hit at least 59 FPS at their default particle counts, with automatic down-scaling for slower devices.

Technique 1: Fixed Timestep with Interpolation

The variable-dt trap

Most simulation tutorials integrate physics with the actual frame delta: position += velocity * dt. This seems sensible — if the last frame took 33 ms instead of 16 ms, the physics advances 33 ms worth of motion, keeping the simulation "real-time." But variable dt breaks determinism, causes energy drift in symplectic integrators, and makes collision detection miss fast-moving objects (tunnelling).

Fixed Timestep Game Loop Pattern

const FIXED_DT = 1 / 60;   // 16.67 ms physics step

let accumulator = 0;
let prevTime = performance.now();

function loop(now) {
  const frameTime = Math.min((now - prevTime) / 1000, 0.25);
  prevTime = now;
  accumulator += frameTime;

  while (accumulator >= FIXED_DT) {
    physicsStep(FIXED_DT);   // deterministic, bounded
    accumulator -= FIXED_DT;
  }

  const alpha = accumulator / FIXED_DT;  // interpolation factor
  render(alpha);             // interpolate between prev/curr state
  requestAnimationFrame(loop);
}

Benefits of fixed dt:
  - Deterministic replay (same input → same output)
  - Stable energy conservation (symplectic Euler, Verlet)
  - Safe CCD: tunnel check per FIXED_DT interval
  - Bounded physics budget even on slow frames (max 0.25s cap)

Cost:
  - State must be stored twice (prev + curr) for interpolation
  - Extra memory: 2× particle arrays ≈ negligible at typical n

The 0.25s cap on frameTime is critical: if the tab is backgrounded or the device is overloaded, we do not let the accumulator explode into a "spiral of death" that queues hundreds of physics steps and freezes the page.

Technique 2: Spatial Hash for Particle Collision

From O(n²) to O(n)

Naïve collision detection checks every particle against every other particle: O(n²). At 500 particles that is 125,000 checks per step; at 5,000 particles it is 12.5 million. The standard fix is the spatial hash: divide space into a grid of cells, assign each particle to its cell, then only check particles in the same or adjacent cells.

Spatial Hash — Construction & Query

Cell size: choose ≈ 2r (particle diameter) for dense packing

Hash function:
  cellX = Math.floor(px / cellSize)
  cellY = Math.floor(py / cellSize)
  hash  = ((cellX * 73856093) ^ (cellY * 19349663)) % tableSize

Build phase (each frame):
  O(n): iterate all particles, insert into hash table
  Use flat Uint32Array (avoid GC pressure from object maps)

Query phase (each particle):
  Check 9 neighbouring cells (3×3 neighbourhood)
  Average neighbours: 4-9 particles (dense) vs 0-1 (sparse)
  Worst case: O(n) only if all particles are in one cell

Expected complexity: O(n) build + O(n·k) query, k = avg neighbours
  k ≈ π·(2r)²·ρ = constant for fixed density → O(n) total

Memory layout (cache-friendly):
  pos[0..n]: x₀,y₀,x₁,y₁,…  (interleaved, Float32Array)
  vel[0..n]: vx₀,vy₀,vx₁,vy₁,… (interleaved, Float32Array)
  Avoids pointer chasing, L1/L2 cache hot during iteration

In our particle simulations (Bubbles, Smoke, Brownian Motion, Molecular Dynamics), this change reduced collision-detection time from ~8 ms to ~0.4 ms for 2000 particles — a 20× speedup that freed the frame budget for more particles and higher-quality rendering.

Technique 3: Web Worker Physics Offloading

Freeing the main thread

The browser main thread handles JavaScript, layout, paint, input events, and Canvas/WebGL flushes. Competing with all of that for physics budget causes jank — visible stutter even when average frame time is within budget, because a sudden GC pause or DOM event can steal 2–5 ms. The solution is to run physics in a dedicated Web Worker, sharing state via SharedArrayBuffer.

Web Worker + SharedArrayBuffer Architecture

Main thread                      Physics Worker
──────────────────────────────────────────────────────
SharedArrayBuffer (SAB):
  positionsBuf: Float32Array(n*2)
  velocitiesBuf: Float32Array(n*2)
  controlBuf:   Int32Array(4)   [step, done, pause, n]

// Main thread: start worker
const worker = new Worker('physics.js');
worker.postMessage({ sab: positionsBuf.buffer, ... });

// requestAnimationFrame: render latest state
function render(alpha) {
  Atomics.wait(controlBuf, 1, 0);  // wait for done flag
  drawParticles(positionsBuf);
  Atomics.store(controlBuf, 0, 1); // signal: step again
}

// physics.js Worker: physics loop
self.onmessage = ({ data }) => {
  const pos = new Float32Array(data.sab);
  while (true) {
    Atomics.wait(data.ctrl, 0, 0); // wait for step signal
    physicsStep(pos, vel, FIXED_DT);
    Atomics.store(data.ctrl, 1, 1); // signal: done
  }
};

Benefits:
  - Physics never blocks input or paint
  - Utilises a second CPU core on all modern phones
  - GC pauses on main thread do not stall physics

Caveats:
  - SharedArrayBuffer requires COOP/COEP headers
  - Atomics.wait() blocks calling thread — use carefully
  - Serialisation-free: no structured clone cost

Security requirement: SharedArrayBuffer requires Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp HTTP headers. These are set in our Cloudflare Pages configuration for all simulation routes.

Technique 4: Battery-Aware Adaptive Quality

Keeping phones cool

Running 60 FPS particle physics drains a phone battery quickly and triggers thermal throttling — the device's CPU/GPU clock speed drops to dissipate heat, causing a sudden FPS cliff that is worse than steady-state lower performance. Our battery-aware system detects battery level and charging state via the Battery Status API and automatically adjusts particle count and simulation quality.

Adaptive Quality — Battery API Integration

// Quality tiers
const QUALITY = {
  high:   { particles: 1.0, substeps: 4, shadows: true  },
  medium: { particles: 0.6, substeps: 2, shadows: false },
  low:    { particles: 0.3, substeps: 1, shadows: false },
};

async function selectQuality() {
  if (!navigator.getBattery) return 'high';
  const bat = await navigator.getBattery();

  if (bat.charging)          return 'high';
  if (bat.level > 0.5)       return 'high';
  if (bat.level > 0.2)       return 'medium';
  return 'low';
}

// Frame-time-based fallback (no Battery API on desktop)
let slowFrames = 0;
function checkPerformance(dt) {
  if (dt > 20) slowFrames++;        // 20ms = 50 FPS
  else         slowFrames = Math.max(0, slowFrames - 1);
  if (slowFrames > 60) downscale(); // 1 second of slow frames
}

// Result on benchmark device (Snapdragon 695):
//   Default (high):      ~58 FPS, battery drain 12%/hour
//   Adaptive (medium):   ~60 FPS, battery drain  7%/hour
//   Adaptive (low):      ~60 FPS, battery drain  4%/hour

Technique 5: Render-Side Optimisations

Physics is only half the frame budget. WebGL draw calls and Three.js overhead account for the other half in complex scenes. Four changes made the biggest impact:

Instanced mesh rendering: replace N separate Mesh objects with one InstancedMesh(geometry, material, N) call, reducing draw calls from N to 1. Critical for particle-heavy simulations (Bubbles, N-Body, Flocking).
Frustum culling on particles: skip updating InstancedMesh matrices for particles outside the camera frustum. On wide long-range simulations (Boids) this removes up to 30% of matrix uploads.
Double-buffered canvas for 2D simulations: draw to an off-screen OffscreenCanvas in the Worker, then transferToImageBitmap() to the main thread — no serialisation, no main-thread draw cost.
Deferred Three.js disposal: call geometry.dispose() and material.dispose() in a requestIdleCallback rather than synchronously on scene exit, avoiding a GC spike that causes a one-frame white flash.

Benchmark Results

Before / After — Reference Device (Snapdragon 695)

Simulation              Before (FPS)  After (FPS)  Technique applied
──────────────────────────────────────────────────────────────────────
Boids (500 agents)          38           60        Spatial hash + Worker
N-Body (200 bodies)         44           60        Barnes-Hut + Worker
Brownian Motion (2000)      29           60        Spatial hash + SAB
Bubbles (1000)              41           60        Instanced mesh + hash
Lennard-Jones MD (800)      35           60        Fixed dt + Worker
SPH Fluid (600 pts)         22           58        Hash + Worker + SAB
Cellular Automata (256²)    60           60        Already optimal
Sorting (1000 bars)         60           60        Pure JS, no change

Overall: 47 simulations below 55 FPS → 0 simulations below 55 FPS
Average FPS gain on low-end device: +19 FPS

Wave 10 Content Highlights

Spotlight #20 — Thermodynamics

Spotlight #20 covers our six thermodynamics simulations with a focus on why entropy and heat flow are the deepest laws in physics. The guide walks through Newton's cooling law, the Carnot efficiency bound, Maxwell-Boltzmann speed distributions, Planck's quantum fix for the ultraviolet catastrophe, Bénard convection pattern formation, and binary alloy phase diagrams. Each simulation is explained from first principles.

Learning #19 — Algorithms & Complexity

Learning #19 unifies all the algorithm simulations under the lens of computational complexity. Starting from Big-O notation, it covers sorting (O(n²) vs O(n log n) made visible), A* vs Dijkstra pathfinding (visited-node comparison), N-Queens backtracking with constraint propagation, the NP-hard Traveling Salesman with 2-opt local search, and genetic algorithms using TSP as the fitness landscape.

What's Next — Wave 11 Preview

Q1 2028 will focus on two under-served areas of the simulation catalogue: the chemistry collection and the social-science / economics collection, which have grown significantly since devlog-26's applied categories push. Wave 11 posts:

Spotlight #21: Chemistry & Chemical Kinetics — reaction-diffusion, combustion, acid-base equilibria, and the Belousov-Zhabotinsky spiral waves.
Learning #20: Agent-Based Modelling — from Boids and flocking to SIR epidemic models, ant colony optimisation, and emergent traffic dynamics.
Devlog #30: Year in Review — 2027 wrapping up, community contributions, and the 2028 roadmap.