Simulation Performance Optimization

Browser simulations often start fast—then slow down as objects multiply. This tutorial covers practical techniques to sustain 60 fps: InstancedMesh for thousands of objects, TypedArrays to reduce GC pressure, spatial hashing for broad-phase collision, Web Workers to move physics off the main thread, and GPU-side tricks.

Know Your Bottleneck First

Before optimising, measure. Chrome DevTools Performance panel and the three.js Stats helper tell you whether you are CPU-bound or GPU-bound:

import Stats from 'https://cdn.jsdelivr.net/npm/three@0.168/examples/jsm/libs/stats.module.js';
const stats = new Stats();
stats.showPanel(0); // 0 = fps, 1 = ms, 2 = mb
document.body.appendChild(stats.dom);

// In animate():
stats.begin();
renderer.render(scene, camera);
stats.end();

Symptom	Likely bottleneck	Fix direction
JS frame time > 10 ms	CPU — JS physics / updates	TypedArrays, WASM, Web Workers
Many draw calls (>500)	CPU — render thread batching	InstancedMesh, merge geometry
High GPU usage, low CPU	GPU — fragment complexity	Reduce shader cost, lower resolution
GC spikes (ms chart jitters)	JS heap allocations	Object pooling, TypedArrays

InstancedMesh — One Draw Call per Object Type

Rendering 10,000 separate Mesh objects = 10,000 draw calls. InstancedMesh renders all of them in one draw call:

const COUNT = 10_000;
const geo  = new THREE.SphereGeometry(0.1, 8, 8);
const mat  = new THREE.MeshStandardMaterial({ color: 0x2299ff });
const iMesh = new THREE.InstancedMesh(geo, mat, COUNT);
scene.add(iMesh);

const dummy = new THREE.Object3D();
const positions = new Float32Array(COUNT * 3); // x,y,z per instance

// Initialize positions
for (let i = 0; i < COUNT; i++) {
  positions[i*3+0] = (Math.random() - 0.5) * 20;
  positions[i*3+1] = (Math.random() - 0.5) * 20;
  positions[i*3+2] = (Math.random() - 0.5) * 20;
}

function updateInstances() {
  for (let i = 0; i < COUNT; i++) {
    dummy.position.set(positions[i*3], positions[i*3+1], positions[i*3+2]);
    dummy.updateMatrix();
    iMesh.setMatrixAt(i, dummy.matrix);
  }
  iMesh.instanceMatrix.needsUpdate = true; // ← required!
}

Only set needsUpdate = true when the data actually changed. Unnecessary updates cause a CPU→GPU buffer upload every frame.

TypedArrays — Eliminate GC Pauses

Regular JavaScript objects ({ x, y, z }) per particle = many heap allocations = GC pauses at the worst moment. Use Float32Array (or Float64Array) instead — the memory is contiguous and never garbage collected:

// ❌ Object array — GC pressure
const particles = Array.from({ length: 10000 }, () => ({
  x: Math.random(), y: Math.random(), z: Math.random(),
  vx: 0, vy: 0, vz: 0,
}));

// ✅ TypedArray SoA (Structure of Arrays) — no GC, cache-friendly
const N = 10_000;
const px = new Float32Array(N), py = new Float32Array(N), pz = new Float32Array(N);
const vx = new Float32Array(N), vy = new Float32Array(N), vz = new Float32Array(N);

// Physics update — no object allocation
for (let i = 0; i < N; i++) {
  vx[i] += 0; // gravity, forces...
  vy[i] -= 9.8 * dt;
  px[i] += vx[i] * dt;
  py[i] += vy[i] * dt;
  pz[i] += vz[i] * dt;
}

SoA (Structure of Arrays) is more cache-friendly than AoS (Array of Structures) because the loop processes one property of all particles at a time, which matches how CPU cache lines work.

Spatial Hashing for O(1) Neighbour Lookup

Naïve collision detection is O(N²) — every particle checks every other. Spatial hashing reduces it to ~O(N) for uniform particle distributions:

class SpatialHash {
  constructor(cellSize) {
    this.cellSize = cellSize;
    this.table = new Map();
  }
  _key(x, y, z) {
    const cx = Math.floor(x / this.cellSize);
    const cy = Math.floor(y / this.cellSize);
    const cz = Math.floor(z / this.cellSize);
    return `${cx},${cy},${cz}`;
  }
  clear() { this.table.clear(); }
  insert(i, x, y, z) {
    const k = this._key(x, y, z);
    if (!this.table.has(k)) this.table.set(k, []);
    this.table.get(k).push(i);
  }
  query(x, y, z) {
    // Returns indices of particles in same and adjacent cells
    const result = [];
    const cx = Math.floor(x / this.cellSize);
    const cy = Math.floor(y / this.cellSize);
    const cz = Math.floor(z / this.cellSize);
    for (let dx = -1; dx <= 1; dx++)
    for (let dy = -1; dy <= 1; dy++)
    for (let dz = -1; dz <= 1; dz++) {
      const k = `${cx+dx},${cy+dy},${cz+dz}`;
      const cell = this.table.get(k);
      if (cell) result.push(...cell);
    }
    return result;
  }
}

// Usage: cell size = 2× particle radius
const hash = new SpatialHash(0.2);
// Each frame: 1) clear 2) insert all 3) query neighbours
hash.clear();
for (let i = 0; i < N; i++) hash.insert(i, px[i], py[i], pz[i]);
for (let i = 0; i < N; i++) {
  const neighbours = hash.query(px[i], py[i], pz[i]);
  // check collision only with neighbours (small set)
}

Fixed Timestep + Web Worker Physics

Physics should run at a fixed step (e.g. 1/120 s) independent of rendering frame rate. Offload to a Web Worker so physics doesn't block the render:

// main.js
const PHYSICS_STEP = 1 / 120;
let accumulator = 0;

// Worker for physics
const worker = new Worker('./physics-worker.js');
const posBuffer = new SharedArrayBuffer(N * 3 * 4); // Float32
const positions = new Float32Array(posBuffer);

worker.postMessage({ type: 'init', buffer: posBuffer, count: N });

// Render loop — just reads the shared buffer
function animate(t) {
  requestAnimationFrame(animate);
  // Only reads — no locking needed for loose sync
  updateInstancesFromBuffer(positions);
  renderer.render(scene, camera);
}

// physics-worker.js
self.onmessage = ({ data }) => {
  if (data.type !== 'init') return;
  const pos = new Float32Array(data.buffer);
  const vel = new Float32Array(data.count * 3);
  const dt = 1 / 120;
  setInterval(() => {
    for (let i = 0; i < data.count; i++) {
      vel[i*3+1] -= 9.8 * dt;
      pos[i*3+0] += vel[i*3+0] * dt;
      pos[i*3+1] += vel[i*3+1] * dt;
      pos[i*3+2] += vel[i*3+2] * dt;
      if (pos[i*3+1] < 0) { pos[i*3+1] = 0; vel[i*3+1] *= -0.7; }
    }
  }, dt * 1000);
};

SharedArrayBuffer requires Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers. For simpler cases use postMessage with a regular ArrayBuffer transferable (zero-copy).

Frustum Culling and LOD

Three.js does frustum culling automatically for individual Mesh objects. For instanced meshes, culling is per-draw-call (either all or nothing). Manual per-instance culling:

const frustum = new THREE.Frustum();
const projScreen = new THREE.Matrix4();

function cullInstances(iMesh, positions, count) {
  projScreen.multiplyMatrices(camera.projectionMatrix, camera.matrixWorldInverse);
  frustum.setFromProjectionMatrix(projScreen);

  const sphere = new THREE.Sphere();
  let visibleCount = 0;

  for (let i = 0; i < count; i++) {
    sphere.center.set(positions[i*3], positions[i*3+1], positions[i*3+2]);
    sphere.radius = 0.1; // bounding radius
    if (frustum.intersectsSphere(sphere)) {
      // Copy matrix to visible slot
      iMesh.getMatrixAt(i, dummy.matrix);
      iMesh.setMatrixAt(visibleCount++, dummy.matrix);
    }
  }
  iMesh.count = visibleCount; // only render visible instances
  iMesh.instanceMatrix.needsUpdate = true;
}

For complex scenes, Three.js has built-in LOD (Level of Detail) — swap to simpler geometry when far from the camera:

const lod = new THREE.LOD();
lod.addLevel(new THREE.Mesh(highPoly, mat), 0);    // <10 units away
lod.addLevel(new THREE.Mesh(midPoly,  mat), 10);   // 10–50 units
lod.addLevel(new THREE.Mesh(lowPoly,  mat), 50);   // >50 units
scene.add(lod);

Things to Avoid

GPU readback — renderer.readRenderTargetPixels() stalls the GPU pipeline. Avoid in render loop.
Allocating inside the loop — new THREE.Vector3(), new Array(), spread operators ([...arr]) all allocate. Pre-allocate and reuse.
Calling .getBoundingBox() every frame — it recomputes from all vertices. Cache it or set manually.
Unbounded physics sub-steps — if the frame takes 200 ms, you may run 24 sub-steps and make it worse. Cap sub-steps at 5–10.
Dynamic shadow maps with many casters — shadow map rendering traverses ALL shadow-casting objects each frame. Use baked lightmaps for static geometry.
Too many unique materials — each unique material = a shader program; switching programs is expensive. Batch: same material across similar objects.
Calling needsUpdate = true on static geometry — each upload re-sends the GPU buffer. Only set when data actually changed.

Continue Learning

WebGL Fluid SimulationGPU-accelerated incompressible fluid on the render target Build a Physics EngineRigid bodies, AABB collision and impulse resolution from scratch Algorithm Complexity ReferenceBig-O for every simulation algorithm Simulation Comparison33 simulations compared by FPS, memory and technique