Here is a fact that surprises many developers: on a mid-range laptop,
calling ctx.fillRect(x, y, 1, 1) 100,000 times per frame
is roughly 40 times slower than writing the same 100,000 pixels
directly into a Uint8ClampedArray and flushing it with a
single ctx.putImageData(). The Canvas 2D API is
beautifully expressive, but every draw call carries significant
per-call overhead: state validation, compositing mode checks, and
bridge crossings between the JavaScript engine and the GPU compositor.
Understanding where that overhead lives — and how to route around it —
is the single most impactful optimisation you can make to particle
simulations, cellular automata, or any visualisation that updates
many pixels per frame.
1. ImageData — Write Pixels, Not Draw Calls
The standard pattern for rendering n coloured dots looks like this:
// Slow: one draw call per particle
for (const p of particles) {
ctx.fillStyle = `rgb(${p.r},${p.g},${p.b})`;
ctx.fillRect(p.x | 0, p.y | 0, 1, 1);
}
Every iteration changes fillStyle (a string parse), then
issues a draw call. For 50,000 particles at 60 fps that is 3 million
string parses and draw calls per second — far more work than the
actual pixel maths.
The fast path creates a backing buffer once, writes RGBA bytes directly, and uploads it in a single call:
// Fast: one putImageData per frame
const W = canvas.width, H = canvas.height;
const imageData = ctx.createImageData(W, H);
const buf = imageData.data; // Uint8ClampedArray, length = W*H*4
function renderParticles(particles) {
// Clear to background colour (0,0,0,255 = opaque black)
buf.fill(0);
buf.fill(255, 3, buf.length); // set every alpha byte
for (const p of particles) {
const i = ((p.y | 0) * W + (p.x | 0)) * 4;
buf[i] = p.r;
buf[i + 1] = p.g;
buf[i + 2] = p.b;
buf[i + 3] = 255;
}
ctx.putImageData(imageData, 0, 0);
}
A cleaner background clear is buf.fill(0) followed by
setting alpha bytes — or maintain a separate Uint32Array
view over the same ArrayBuffer for 4-byte-at-a-time
writes:
const buf32 = new Uint32Array(imageData.data.buffer);
// Pack RGBA into a single uint32 (little-endian: ABGR in memory)
const colour = (255 << 24) | (b << 16) | (g << 8) | r;
buf32[(p.y | 0) * W + (p.x | 0)] = colour;
The Uint32Array trick cuts the inner loop body from four
byte writes to one 32-bit write. On our reaction-diffusion simulations
(1280×720 canvas, all pixels active) this alone cuts render time from
~11 ms to ~3 ms per frame on Chrome 124 / M2 MacBook Air.
2. Float32Array Particle Buffers — Avoid Object Overhead
JavaScript objects carry hidden-class overhead. An array of 100,000
particle objects — each with x, y,
vx, vy, r, g,
b, life properties — forces the JS engine
to perform property lookups that are difficult to optimise even with
V8's hidden classes, because particle properties are updated in-place
and the shape of each object may diverge.
Structure-of-Arrays (SoA) using typed arrays eliminates this overhead entirely:
const N = 100_000; // max particles
const px = new Float32Array(N); // x positions
const py = new Float32Array(N); // y positions
const pvx = new Float32Array(N); // x velocities
const pvy = new Float32Array(N); // y velocities
const pr = new Uint8Array(N); // red channel
const pg = new Uint8Array(N); // green channel
const pb = new Uint8Array(N); // blue channel
const pl = new Float32Array(N); // lifetime (seconds)
function updateParticles(dt, count) {
for (let i = 0; i < count; i++) {
pvx[i] += ax * dt;
pvy[i] += (ay + GRAVITY) * dt;
px[i] += pvx[i] * dt;
py[i] += pvy[i] * dt;
pl[i] -= dt;
}
}
Because each array is a contiguous block of memory with a fixed element size, the CPU prefetcher and SIMD auto-vectoriser can process runs of elements without pointer chasing. In our boids simulation, switching from an array of objects to SoA typed arrays reduced the physics update from ~8 ms to ~1.4 ms for 80,000 agents — a 5.7× speedup with identical logic.
The maths is unchanged: position update is simply
p += v × Δt, acceleration is
a = F / m. With typed arrays you are just expressing
it in a layout that the engine can optimise aggressively.
3. Dirty-Rectangle Partial Redraws — Skip Unchanged Pixels
Full-canvas putImageData() uploads W × H × 4
bytes regardless of how many pixels changed. For a 1920×1080 canvas
that is ~8 MB per frame over the CPU → compositor bridge. When
only a small region of the canvas is active — for example, a particle
burst in the lower-left corner, or a localised reaction front — it is
far cheaper to upload only the bounding box of changed pixels.
The approach is to track a dirty rectangle that expands to contain every modified pixel, then flush only that sub-image:
let dirtyX0 = W, dirtyY0 = H, dirtyX1 = 0, dirtyY1 = 0;
function markDirty(x, y) {
if (x < dirtyX0) dirtyX0 = x;
if (y < dirtyY0) dirtyY0 = y;
if (x > dirtyX1) dirtyX1 = x;
if (y > dirtyY1) dirtyY1 = y;
}
function flushDirtyRect() {
if (dirtyX1 < dirtyX0) return; // nothing changed
const dw = dirtyX1 - dirtyX0 + 1;
const dh = dirtyY1 - dirtyY0 + 1;
// Extract sub-image from full buffer
const sub = ctx.createImageData(dw, dh);
for (let row = 0; row < dh; row++) {
const srcOff = ((dirtyY0 + row) * W + dirtyX0) * 4;
sub.data.set(buf.subarray(srcOff, srcOff + dw * 4), row * dw * 4);
}
ctx.putImageData(sub, dirtyX0, dirtyY0);
// Reset dirty rect
dirtyX0 = W; dirtyY0 = H; dirtyX1 = 0; dirtyY1 = 0;
}
When only 20% of the canvas is active, this reduces upload bandwidth
by roughly 80%. In our Turing pattern (reaction-diffusion) simulator
at 1024×768 resolution, dirty-rect flushing reduced the
putImageData cost from 6.2 ms to 1.3 ms per frame during
the early stable phase when only the wavefront is moving.
Caveat: dirty rects only help when activity is
genuinely sparse. If every pixel updates every frame (e.g. a
full-screen fluid simulation), maintain a single full-canvas
ImageData and call putImageData(imageData, 0, 0)
without the sub-image extraction overhead.
Batching Context State Changes
Even when using putImageData for pixel work, you often
mix in Canvas 2D API calls for UI overlays, text labels, or vector
shapes. Each time you change ctx.strokeStyle,
ctx.lineWidth, ctx.font, or
ctx.globalAlpha, the browser may flush pending draw
operations. Group calls by shared state to minimise flushes:
// Bad: state toggles between every path
for (const seg of segments) {
ctx.strokeStyle = seg.highlighted ? '#f00' : '#888';
ctx.lineWidth = seg.highlighted ? 2 : 1;
ctx.beginPath();
ctx.moveTo(seg.x0, seg.y0);
ctx.lineTo(seg.x1, seg.y1);
ctx.stroke();
}
// Good: sort by state, batch strokes
const normal = segments.filter(s => !s.highlighted);
const highlighted = segments.filter(s => s.highlighted);
ctx.strokeStyle = '#888';
ctx.lineWidth = 1;
ctx.beginPath();
for (const s of normal) { ctx.moveTo(s.x0, s.y0); ctx.lineTo(s.x1, s.y1); }
ctx.stroke();
ctx.strokeStyle = '#f00';
ctx.lineWidth = 2;
ctx.beginPath();
for (const s of highlighted) { ctx.moveTo(s.x0, s.y0); ctx.lineTo(s.x1, s.y1); }
ctx.stroke();
For complex scenes, ctx.save() / ctx.restore()
are expensive because they snapshot the full graphics state. Prefer
explicit state tracking in JavaScript — record the current
strokeStyle and only call the setter when it actually
changes.
Benchmark Summary
| Technique | Scenario | Before | After | Speedup |
|---|---|---|---|---|
| ImageData + Uint32Array | 100k particles, 1280×720 | ~41 ms/frame | ~4 ms/frame | 10× |
| SoA Float32Array | 80k boids physics update | ~8 ms | ~1.4 ms | 5.7× |
| Dirty-rect flush | Reaction-diffusion wavefront, 1024×768 | ~6.2 ms | ~1.3 ms | 4.8× |
| Batched state changes | 5k line segments, mixed state | ~3.8 ms | ~0.9 ms | 4.2× |
All figures measured in Chrome 124 on an M2 MacBook Air using
performance.now() bracketing, averaged over 300 frames.
Results will vary by device and browser, but the relative ordering
is consistent.
Rule of thumb: if your simulation updates more than
~1,000 pixels or particles per frame, move to
ImageData + typed arrays immediately. The Canvas 2D
draw-call API is designed for vector graphics, not raster simulation.
Try It Yourself
The techniques above are visible in action across several mysimulator.uk simulations — open DevTools Performance tab while running them to see the frame budget in action:
- 🌊 SPH Fluid Simulation — Float32Array SoA buffers for 60k smoothed-particle-hydrodynamics particles; ImageData pixel write for density field visualisation.
- 🧬 Reaction-Diffusion (Turing Patterns) — full-canvas ImageData + dirty-rect optimisation during wavefront propagation; Uint32Array ABGR packing for colour mapping.
- 🐦 Boids Flocking — SoA typed arrays for 80,000-agent flock; batched Canvas 2D path calls for wing trails.
Closing Thought
The Canvas 2D API is not slow — it is simply designed for a different
use case than high-throughput simulation. The moment you find yourself
in a loop calling draw primitives per-entity, you have left its design
envelope. ImageData, typed arrays, dirty rects, and
batched state are not exotic hacks; they are the standard toolkit for
anyone building live visualisations at 60 fps. Apply them in that
order: raw pixel access first, memory layout second, upload area
third, draw-call batching last — because each subsequent gain is
smaller than the previous one, but together they compound into
simulations that feel genuinely fluid even on modest hardware.
Once you have extracted everything possible from Canvas 2D, the next step is WebGL or WebGPU for compute shaders — but many simulations never need to go that far. Getting 10× faster within the same API is usually the more pragmatic win.