ECS Architecture for Large-Scale Simulations
When your particle simulation has 10 000 entities and runs below 30 fps, the issue is usually data layout, not algorithm. Entity-Component-System (ECS) with Structure-of-Arrays storage can give 3–10× throughput vs object-per-entity approaches — here's how and why.
1. Why Not Just Objects? (Cache Miss Problem)
The natural JavaScript approach stores each particle as an object:
// Array of Objects (AoS) — intuitive but slow for mass updates
class Particle {
constructor() {
this.x = 0; this.y = 0; this.z = 0;
this.vx = 0; this.vy = 0; this.vz = 0;
this.mass = 1; this.alive = true;
// ... more fields
}
}
const particles = Array.from({length: 100000}, () => new Particle());
// Update loop
for (const p of particles) {
p.x += p.vx * dt; // reads spread across heap — CACHE MISS every particle
}
The problem: each Particle object lives at a random heap address. Iterating 100k particles means 100k separate memory locations — defeating the CPU L1/L2 cache (typically 32–512 KB). On a modern CPU, a cache miss costs ~100 cycles vs ~4 cycles for a cache hit.
2. ECS Core Concepts
ECS separates data from logic into three distinct abstractions:
| Concept | Role | Example |
|---|---|---|
| Entity | A unique ID — nothing else | Entity 42 (just a number) |
| Component | Plain data struct, no methods | Position{x,y,z}, Velocity{vx,vy,vz}, Mass{m} |
| System | Logic that processes entities with specific components | GravitySystem (needs Position + Velocity + Mass) |
| World | Container holding all entities, component stores, and systems | world.addSystem(new GravitySystem()) |
Key rules:
- Components contain only data — no methods, no logic
- Systems contain only logic — no persistent state (besides configuration)
- Entities are just integer IDs — the World manages their component membership
3. Structure of Arrays vs Array of Structures
Instead of one array of objects, use one array per component field. When you iterate positions, you read a single contiguous Float32Array — maximally cache-friendly.
With SoA, a system accessing only positions reads x[],
y[], z[] — three contiguous arrays. The SIMD
vectoriser in V8/SpiderMonkey can also auto-vectorise the inner loop.
4. The World Class
class World {
constructor(capacity = 100000) {
this.capacity = capacity;
this.entityCount = 0;
this.components = new Map(); // componentName → ComponentStore
this.systems = [];
this.alive = new Uint8Array(capacity); // bitset for active entities
}
createEntity() {
const id = this.entityCount++;
this.alive[id] = 1;
return id;
}
destroyEntity(id) {
this.alive[id] = 0;
// Component data at index id is now considered "garbage"
// Can be reclaimed by maintaining a free-list
}
registerComponent(name, store) {
this.components.set(name, store);
}
addSystem(system) {
system.world = this;
this.systems.push(system);
}
update(dt) {
for (const system of this.systems) system.update(dt);
}
}
5. Component Storage
// SoA component store for 3D position
class PositionStore {
constructor(capacity) {
this.x = new Float32Array(capacity);
this.y = new Float32Array(capacity);
this.z = new Float32Array(capacity);
}
set(id, x, y, z) {
this.x[id] = x; this.y[id] = y; this.z[id] = z;
}
}
class VelocityStore {
constructor(capacity) {
this.x = new Float32Array(capacity);
this.y = new Float32Array(capacity);
this.z = new Float32Array(capacity);
}
}
class MassStore {
constructor(capacity) {
this.m = new Float32Array(capacity);
this.invM = new Float32Array(capacity); // precomputed 1/m
}
set(id, mass) {
this.m[id] = mass;
this.invM[id] = 1.0 / mass;
}
}
// Register in World
const world = new World(200000);
world.registerComponent('pos', new PositionStore(200000));
world.registerComponent('vel', new VelocityStore(200000));
world.registerComponent('mass', new MassStore(200000));
// Create entities
for (let i = 0; i < 100000; i++) {
const id = world.createEntity();
world.components.get('pos').set(id,
(Math.random()-0.5)*100, (Math.random()-0.5)*100, 0);
world.components.get('vel').set(id, ... );
world.components.get('mass').set(id, 1.0 + Math.random());
}
6. Systems and Queries
class GravitySystem {
constructor(g = -9.81) { this.g = g; }
update(dt) {
const { world, g } = this;
const count = world.entityCount;
const alive = world.alive;
const vel = world.components.get('vel');
// Direct array iteration — no object allocation, maximally cache-friendly
for (let id = 0; id < count; id++) {
if (!alive[id]) continue; // skip dead entities
vel.y[id] += g * dt; // apply gravity to vy
}
}
}
class IntegrateSystem {
update(dt) {
const { world } = this;
const count = world.entityCount;
const alive = world.alive;
const pos = world.components.get('pos');
const vel = world.components.get('vel');
const posX = pos.x, posY = pos.y, posZ = pos.z;
const velX = vel.x, velY = vel.y, velZ = vel.z;
// Unrolled for best JIT optimisation (V8 can SIMD-ise this pattern)
for (let id = 0; id < count; id++) {
if (!alive[id]) continue;
posX[id] += velX[id] * dt;
posY[id] += velY[id] * dt;
posZ[id] += velZ[id] * dt;
}
}
}
world.addSystem(new GravitySystem(-9.81));
world.addSystem(new IntegrateSystem());
if (!alive[id]) check entirely.
The branch misprediction cost can be noticeable at 100k iterations.
7. Full Example: GPU-Rendered Particle Simulation
ECS handles simulation logic on the CPU. Each frame, upload the
position arrays to a GPU buffer and render with
gl.drawArrays(gl.POINTS, …):
// One-time: create interleaved GPU VBO from SoA data
const vbo = gl.createBuffer();
let gpuPositions = new Float32Array(world.entityCount * 3);
class RenderSystem {
constructor(gl, shaderProgram) {
this.gl = gl;
this.program = shaderProgram;
this.vbo = gl.createBuffer();
this.aLoc = gl.getAttribLocation(shaderProgram, 'aPos');
}
update(_dt) {
const { gl, world } = this;
const count = world.entityCount;
const pos = world.components.get('pos');
// Interleave SoA → interleaved for GPU (or use separate attrib buffers)
for (let i = 0, id = 0; id < count; id++) {
if (!world.alive[id]) continue;
gpuPositions[i++] = pos.x[id];
gpuPositions[i++] = pos.y[id];
gpuPositions[i++] = pos.z[id];
}
gl.bindBuffer(gl.ARRAY_BUFFER, this.vbo);
gl.bufferData(gl.ARRAY_BUFFER, gpuPositions, gl.DYNAMIC_DRAW);
gl.useProgram(this.program);
gl.enableVertexAttribArray(this.aLoc);
gl.vertexAttribPointer(this.aLoc, 3, gl.FLOAT, false, 0, 0);
gl.drawArrays(gl.POINTS, 0, aliveCount);
}
}
Float32Array buffers per attrib and use
gl.bindBuffer + gl.bufferSubData to update only changed
portions. Alternatively, do the simulation on the GPU as well (compute
shader or fragment shader ping-pong) and avoid CPU→GPU upload
entirely.
8. Performance Tips and Benchmarks
| Approach | 100k particles time/frame |
|---|---|
| Array of Objects (AoS), naive JS | ~18 ms |
| AoS with typed arrays (Float64Array) | ~8 ms |
| SoA Float32Array — ECS style | ~2.5 ms |
| SoA + skip alive check (dense) | ~1.8 ms |
| GPU compute (WebGL2 fragment shader) | ~0.3 ms |
Benchmarks on Chrome 120, M2 MacBook Air. Times = CPU update + render, excluding VSYNC.
Additional Tips
- Use Float32 not Float64: Float32 is 4 bytes vs 8 bytes — 2× more data fits in cache. Sufficient for positions and velocities unless you need astronomical precision.
- Free-list for dead entities: Instead of scanning dead slots, maintain a queue of recycled entity IDs. Fill from the start of the array to maximise cache density.
- Component bitmask: Store a bitmask per entity indicating which components it has. Querying which entities have both Position and Velocity becomes a bitwise AND scan.
- Group entities by archetype: All entities with the same component set share contiguous memory blocks (this is what Unity DOTS and Bevy's ECS do). Requires more plumbing but gives 5–20× speedups for complex component queries.
- Avoid delete + re-create: Re-use entity IDs rather than calling createEntity() each frame. Memory allocation jitter causes GC pressure.