Tip: Canvas 2D Performance — ImageData, Typed Arrays & Dirty Rects

Most Canvas 2D bottlenecks are not in JavaScript — they are in the number of draw calls. Replacing fillRect loops with direct pixel writes and partial redraws can yield 4–10× throughput gains with no algorithm changes whatsoever.

Here is a fact that surprises many developers: on a mid-range laptop, calling ctx.fillRect(x, y, 1, 1) 100,000 times per frame is roughly 40 times slower than writing the same 100,000 pixels directly into a Uint8ClampedArray and flushing it with a single ctx.putImageData(). The Canvas 2D API is beautifully expressive, but every draw call carries significant per-call overhead: state validation, compositing mode checks, and bridge crossings between the JavaScript engine and the GPU compositor. Understanding where that overhead lives — and how to route around it — is the single most impactful optimisation you can make to particle simulations, cellular automata, or any visualisation that updates many pixels per frame.

1. ImageData — Write Pixels, Not Draw Calls

The standard pattern for rendering n coloured dots looks like this:

// Slow: one draw call per particle
for (const p of particles) {
  ctx.fillStyle = `rgb(${p.r},${p.g},${p.b})`;
  ctx.fillRect(p.x | 0, p.y | 0, 1, 1);
}

Every iteration changes fillStyle (a string parse), then issues a draw call. For 50,000 particles at 60 fps that is 3 million string parses and draw calls per second — far more work than the actual pixel maths.

The fast path creates a backing buffer once, writes RGBA bytes directly, and uploads it in a single call:

// Fast: one putImageData per frame
const W = canvas.width, H = canvas.height;
const imageData = ctx.createImageData(W, H);
const buf = imageData.data; // Uint8ClampedArray, length = W*H*4

function renderParticles(particles) {
  // Clear to background colour (0,0,0,255 = opaque black)
  buf.fill(0);
  buf.fill(255, 3, buf.length); // set every alpha byte

  for (const p of particles) {
    const i = ((p.y | 0) * W + (p.x | 0)) * 4;
    buf[i]     = p.r;
    buf[i + 1] = p.g;
    buf[i + 2] = p.b;
    buf[i + 3] = 255;
  }

  ctx.putImageData(imageData, 0, 0);
}

A cleaner background clear is buf.fill(0) followed by setting alpha bytes — or maintain a separate Uint32Array view over the same ArrayBuffer for 4-byte-at-a-time writes:

const buf32 = new Uint32Array(imageData.data.buffer);
// Pack RGBA into a single uint32 (little-endian: ABGR in memory)
const colour = (255 << 24) | (b << 16) | (g << 8) | r;
buf32[(p.y | 0) * W + (p.x | 0)] = colour;

The Uint32Array trick cuts the inner loop body from four byte writes to one 32-bit write. On our reaction-diffusion simulations (1280×720 canvas, all pixels active) this alone cuts render time from ~11 ms to ~3 ms per frame on Chrome 124 / M2 MacBook Air.

2. Float32Array Particle Buffers — Avoid Object Overhead

JavaScript objects carry hidden-class overhead. An array of 100,000 particle objects — each with x, y, vx, vy, r, g, b, life properties — forces the JS engine to perform property lookups that are difficult to optimise even with V8's hidden classes, because particle properties are updated in-place and the shape of each object may diverge.

Structure-of-Arrays (SoA) using typed arrays eliminates this overhead entirely:

const N = 100_000; // max particles

const px  = new Float32Array(N); // x positions
const py  = new Float32Array(N); // y positions
const pvx = new Float32Array(N); // x velocities
const pvy = new Float32Array(N); // y velocities
const pr  = new Uint8Array(N);   // red channel
const pg  = new Uint8Array(N);   // green channel
const pb  = new Uint8Array(N);   // blue channel
const pl  = new Float32Array(N); // lifetime (seconds)

function updateParticles(dt, count) {
  for (let i = 0; i < count; i++) {
    pvx[i] += ax * dt;
    pvy[i] += (ay + GRAVITY) * dt;
    px[i]  += pvx[i] * dt;
    py[i]  += pvy[i] * dt;
    pl[i]  -= dt;
  }
}

Because each array is a contiguous block of memory with a fixed element size, the CPU prefetcher and SIMD auto-vectoriser can process runs of elements without pointer chasing. In our boids simulation, switching from an array of objects to SoA typed arrays reduced the physics update from ~8 ms to ~1.4 ms for 80,000 agents — a 5.7× speedup with identical logic.

The maths is unchanged: position update is simply p += v × Δt, acceleration is a = F / m. With typed arrays you are just expressing it in a layout that the engine can optimise aggressively.

3. Dirty-Rectangle Partial Redraws — Skip Unchanged Pixels

Full-canvas putImageData() uploads W × H × 4 bytes regardless of how many pixels changed. For a 1920×1080 canvas that is ~8 MB per frame over the CPU → compositor bridge. When only a small region of the canvas is active — for example, a particle burst in the lower-left corner, or a localised reaction front — it is far cheaper to upload only the bounding box of changed pixels.

The approach is to track a dirty rectangle that expands to contain every modified pixel, then flush only that sub-image:

let dirtyX0 = W, dirtyY0 = H, dirtyX1 = 0, dirtyY1 = 0;

function markDirty(x, y) {
  if (x < dirtyX0) dirtyX0 = x;
  if (y < dirtyY0) dirtyY0 = y;
  if (x > dirtyX1) dirtyX1 = x;
  if (y > dirtyY1) dirtyY1 = y;
}

function flushDirtyRect() {
  if (dirtyX1 < dirtyX0) return; // nothing changed

  const dw = dirtyX1 - dirtyX0 + 1;
  const dh = dirtyY1 - dirtyY0 + 1;

  // Extract sub-image from full buffer
  const sub = ctx.createImageData(dw, dh);
  for (let row = 0; row < dh; row++) {
    const srcOff = ((dirtyY0 + row) * W + dirtyX0) * 4;
    sub.data.set(buf.subarray(srcOff, srcOff + dw * 4), row * dw * 4);
  }

  ctx.putImageData(sub, dirtyX0, dirtyY0);

  // Reset dirty rect
  dirtyX0 = W; dirtyY0 = H; dirtyX1 = 0; dirtyY1 = 0;
}

When only 20% of the canvas is active, this reduces upload bandwidth by roughly 80%. In our Turing pattern (reaction-diffusion) simulator at 1024×768 resolution, dirty-rect flushing reduced the putImageData cost from 6.2 ms to 1.3 ms per frame during the early stable phase when only the wavefront is moving.

Caveat: dirty rects only help when activity is genuinely sparse. If every pixel updates every frame (e.g. a full-screen fluid simulation), maintain a single full-canvas ImageData and call putImageData(imageData, 0, 0) without the sub-image extraction overhead.

Batching Context State Changes

Even when using putImageData for pixel work, you often mix in Canvas 2D API calls for UI overlays, text labels, or vector shapes. Each time you change ctx.strokeStyle, ctx.lineWidth, ctx.font, or ctx.globalAlpha, the browser may flush pending draw operations. Group calls by shared state to minimise flushes:

// Bad: state toggles between every path
for (const seg of segments) {
  ctx.strokeStyle = seg.highlighted ? '#f00' : '#888';
  ctx.lineWidth   = seg.highlighted ? 2 : 1;
  ctx.beginPath();
  ctx.moveTo(seg.x0, seg.y0);
  ctx.lineTo(seg.x1, seg.y1);
  ctx.stroke();
}

// Good: sort by state, batch strokes
const normal      = segments.filter(s => !s.highlighted);
const highlighted = segments.filter(s =>  s.highlighted);

ctx.strokeStyle = '#888';
ctx.lineWidth   = 1;
ctx.beginPath();
for (const s of normal) { ctx.moveTo(s.x0, s.y0); ctx.lineTo(s.x1, s.y1); }
ctx.stroke();

ctx.strokeStyle = '#f00';
ctx.lineWidth   = 2;
ctx.beginPath();
for (const s of highlighted) { ctx.moveTo(s.x0, s.y0); ctx.lineTo(s.x1, s.y1); }
ctx.stroke();

For complex scenes, ctx.save() / ctx.restore() are expensive because they snapshot the full graphics state. Prefer explicit state tracking in JavaScript — record the current strokeStyle and only call the setter when it actually changes.

Benchmark Summary

Technique Scenario Before After Speedup
ImageData + Uint32Array 100k particles, 1280×720 ~41 ms/frame ~4 ms/frame 10×
SoA Float32Array 80k boids physics update ~8 ms ~1.4 ms 5.7×
Dirty-rect flush Reaction-diffusion wavefront, 1024×768 ~6.2 ms ~1.3 ms 4.8×
Batched state changes 5k line segments, mixed state ~3.8 ms ~0.9 ms 4.2×

All figures measured in Chrome 124 on an M2 MacBook Air using performance.now() bracketing, averaged over 300 frames. Results will vary by device and browser, but the relative ordering is consistent.

Rule of thumb: if your simulation updates more than ~1,000 pixels or particles per frame, move to ImageData + typed arrays immediately. The Canvas 2D draw-call API is designed for vector graphics, not raster simulation.

Try It Yourself

The techniques above are visible in action across several mysimulator.uk simulations — open DevTools Performance tab while running them to see the frame budget in action:

  • 🌊 SPH Fluid Simulation — Float32Array SoA buffers for 60k smoothed-particle-hydrodynamics particles; ImageData pixel write for density field visualisation.
  • 🧬 Reaction-Diffusion (Turing Patterns) — full-canvas ImageData + dirty-rect optimisation during wavefront propagation; Uint32Array ABGR packing for colour mapping.
  • 🐦 Boids Flocking — SoA typed arrays for 80,000-agent flock; batched Canvas 2D path calls for wing trails.

Closing Thought

The Canvas 2D API is not slow — it is simply designed for a different use case than high-throughput simulation. The moment you find yourself in a loop calling draw primitives per-entity, you have left its design envelope. ImageData, typed arrays, dirty rects, and batched state are not exotic hacks; they are the standard toolkit for anyone building live visualisations at 60 fps. Apply them in that order: raw pixel access first, memory layout second, upload area third, draw-call batching last — because each subsequent gain is smaller than the previous one, but together they compound into simulations that feel genuinely fluid even on modest hardware.

Once you have extracted everything possible from Canvas 2D, the next step is WebGL or WebGPU for compute shaders — but many simulations never need to go that far. Getting 10× faster within the same API is usually the more pragmatic win.