InstancedMesh: 1 000 000 Objects at 60 FPS
Drawing 10 000 individual meshes? Expect ~10 000 draw calls and
single-digit FPS.
THREE.InstancedMesh renders all of them in
one draw call, with per-instance position, rotation,
scale, and colour. This guide covers setup, GPU picking, frustum
culling, animation, and pushing to 1 million instances.
1. Why Instancing Matters
Each THREE.Mesh triggers a separate draw call to the GPU.
Draw calls are expensive: the CPU must set shader uniforms, bind
vertex buffers, and issue the GPU command. At ~5 μs per draw call on a
modern desktop, 10 000 meshes = 50 ms of CPU overhead alone — killing
your frame budget.
Instanced rendering uploads one geometry and one material, plus an array of per-instance data (4×4 transform matrices), and tells the GPU: "draw this N times, each with a different matrix." Result: 1 draw call regardless of N.
| Approach | Draw calls | CPU cost | GPU cost |
|---|---|---|---|
| Individual meshes | N | O(N) — bottleneck | O(N × verts) |
| Merged geometry | 1 | O(1) | O(N × verts) — huge VBO |
| InstancedMesh | 1 | O(1) | O(verts + N) |
BatchedMesh (Three.js r160+) or merge groups manually.
2. Basic InstancedMesh Setup
const COUNT =
100_000;
const geometry =
new THREE.BoxGeometry(1, 1,
1); const material
= new THREE.MeshStandardMaterial({ color:
0xffffff });
// Create InstancedMesh with max instance count
const mesh =
new THREE.InstancedMesh(geometry, material, COUNT);
// Must set initial matrices (defaults are zero →
invisible!)
const matrix =
new THREE.Matrix4();
for (let i =
0; i < COUNT; i++) { matrix.setPosition( (Math.random() -
0.5) * 500,
(Math.random() -
0.5) * 500,
(Math.random() -
0.5) * 500 );
mesh.setMatrixAt(i, matrix); }
// CRITICAL: mark the instance matrix buffer as needing
upload
mesh.instanceMatrix.needsUpdate = true;
scene.add(mesh);
mesh.instanceMatrix.needsUpdate = true
after modifying matrices. Without it, the GPU buffer is never updated
and instances stay at the origin. Set it once after batch updates, or
every frame if animating.
3. Per-Instance Transforms (setMatrixAt)
Each instance has a full 4×4 transformation matrix encoding position,
rotation, and scale. Use THREE.Matrix4 to compose
transforms, then write with setMatrixAt(index, matrix).
const dummy =
new THREE.Object3D();
for (let i =
0; i < COUNT; i++) {
// Position
dummy.position.set( positions[i *
3], positions[i *
3 + 1],
positions[i * 3 +
2] );
// Rotation (Euler or quaternion)
dummy.rotation.set( rotations[i *
3], rotations[i *
3 + 1],
rotations[i * 3 +
2] );
// Scale (non-uniform okay)
dummy.scale.set(scales[i], scales[i],
scales[i]);
// Compose the TRS matrix
dummy.updateMatrix(); mesh.setMatrixAt(i, dummy.matrix); } mesh.instanceMatrix.needsUpdate =
true;
Object3D.updateMatrix() composes position/rotation/scale
into a Matrix4. This is convenient but calls
Matrix4.compose() internally. For maximum throughput,
write directly to mesh.instanceMatrix.array (a
Float32Array of 16 floats per instance): skip the Matrix4 allocation
entirely.
// Direct buffer write (fastest path)
const arr = mesh.instanceMatrix.array;
for (let i =
0; i < COUNT; i++) {
const off = i * 16;
// Identity scale + translation only (no rotation):
// Column-major 4x4:
arr[off + 0] = 1;
arr[off + 1] = 0;
arr[off + 2] = 0;
arr[off + 3] = 0;
arr[off + 4] = 0;
arr[off + 5] = 1;
arr[off + 6] = 0;
arr[off + 7] = 0;
arr[off + 8] = 0;
arr[off + 9] = 0;
arr[off + 10] = 1;
arr[off + 11] = 0;
arr[off + 12] = x; arr[off +
13] = y; arr[off +
14] = z; arr[off +
15] = 1; }
mesh.instanceMatrix.needsUpdate =
true;
4. Per-Instance Colour
InstancedMesh supports per-instance colour out of the box
via setColorAt() (Three.js r138+). Under the hood, this
creates an InstancedBufferAttribute on the
instanceColor property.
const color =
new THREE.Color();
for (let i =
0; i < COUNT; i++) { color.setHSL(i / COUNT, 0.8,
0.5); mesh.setColorAt(i, color); }
// Mark colour buffer for upload
mesh.instanceColor.needsUpdate = true;
Custom Per-Instance Attributes
Need more data per instance (opacity, size, temperature, etc.)? Add
custom InstancedBufferAttributes to the geometry:
// Add per-instance "temperature" float attribute
const temps =
new Float32Array(COUNT);
for (let i =
0; i < COUNT; i++) temps[i] = Math.random(); geometry.setAttribute('aTemperature', new THREE.InstancedBufferAttribute(temps, 1) );
// In custom ShaderMaterial vertex shader:
// attribute float aTemperature;
// varying float vTemp;
// void main() { vTemp = aTemperature; ... }
5. GPU Picking for Instanced Objects
Raycasting against an InstancedMesh works with
THREE.Raycaster (since r126), but is CPU-bound for large
counts. For massive scenes, use GPU picking: render
each instance with a unique ID encoded as RGB colour, then read the
pixel under the mouse.
// 1. Create a picking render target and material
const pickTarget =
new THREE.WebGLRenderTarget(1, 1);
const pickMaterial =
new THREE.ShaderMaterial({ vertexShader:
` void main() { gl_Position = projectionMatrix * modelViewMatrix
* instanceMatrix * vec4(position, 1.0); } `, fragmentShader:
` flat varying float vInstanceId; void main() { // Encode
instance ID as RGB24 (supports up to 16 777 216 instances) float
id = vInstanceId; gl_FragColor = vec4( mod(id, 256.0) / 255.0,
mod(floor(id / 256.0), 256.0) / 255.0, floor(id / 65536.0) /
255.0, 1.0 ); } `
});
// 2. Render 1×1 pixel at mouse position
function
gpuPick(mouseNDC, camera, renderer, scene) {
// Set camera to render only the pixel under the mouse
camera.setViewOffset(
renderer.domElement.width, renderer.domElement.height, mouseNDC.x *
renderer.domElement.width * 0.5 +
renderer.domElement.width * 0.5,
-mouseNDC.y * renderer.domElement.height *
0.5 + renderer.domElement.height *
0.5, 1,
1
);
// Swap material, render to pick target
mesh.material = pickMaterial; renderer.setRenderTarget(pickTarget); renderer.render(scene,
camera);
// Read pixel
const pixel =
new Uint8Array(4);
renderer.readRenderTargetPixels(pickTarget,
0, 0,
1, 1, pixel);
// Decode ID
const id = pixel[0]
+ pixel[1] * 256 +
pixel[2] * 65536;
// Restore
mesh.material = material; renderer.setRenderTarget(null); camera.clearViewOffset();
return id; }
gl_InstanceID directly in the vertex shader (GLSL 300 es)
with a flat varying to pass it to the fragment shader — avoids the
need for a custom instance attribute. Requires
THREE.WebGLRenderer with WebGL2 context.
6. Manual Frustum Culling
Three.js frustum-culls the entire InstancedMesh as a
single bounding box — so if any instance is visible,
all instances are rendered. For large worlds, this wastes GPU
fill rate on off-screen instances.
Solution: implement per-instance frustum culling by dynamically adjusting the visible instance count and reordering the matrix array to put visible instances at the front.
const frustum =
new THREE.Frustum();
const projScreenMatrix =
new THREE.Matrix4();
const sphere =
new THREE.Sphere();
const pos =
new THREE.Vector3();
function
cullInstances(camera) {
projScreenMatrix.multiplyMatrices(
camera.projectionMatrix, camera.matrixWorldInverse ); frustum.setFromProjectionMatrix(projScreenMatrix);
let visible = 0;
const dummy =
new THREE.Matrix4();
for (let i =
0; i < totalCount; i++) {
// Extract position from stored matrices
pos.set(allPositions[i*3], allPositions[i*3+1], allPositions[i*3+2]); sphere.set(pos, instanceRadius);
if (frustum.intersectsSphere(sphere)) {
// Copy this instance's matrix to the visible slot
dummy.setPosition(pos.x, pos.y, pos.z);
mesh.setMatrixAt(visible, dummy); visible++;
} } mesh.count = visible;
// only draw visible instances!
mesh.instanceMatrix.needsUpdate = true;
}
7. Animating Instances
To animate per-instance transforms each frame, update the matrix array
and set
needsUpdate = true. For maximum performance, write
directly to the Float32Array:
const arr = mesh.instanceMatrix.array;
const STRIDE = 16;
function animate(t)
{ for (let i =
0; i < COUNT; i++) {
const off = i * STRIDE;
// Simple orbit animation: x = R·cos(ωt + φ_i), z = R·sin(ωt +
φ_i)
const phase = i *
0.001; const R =
50 + i * 0.005;
const omega = 0.5 /
(R * 0.02);
// Translation columns (indices 12, 13, 14 in column-major)
arr[off + 12] = R * Math.cos(omega * t + phase); arr[off + 13] =
(Math.sin(t * 0.3 +
phase) * 20); arr[off +
14] = R * Math.sin(omega * t + phase); } mesh.instanceMatrix.needsUpdate =
true; renderer.render(scene, camera);
requestAnimationFrame(animate); }
requestAnimationFrame(animate);
GPU-Side Animation (Vertex Shader)
For the best performance, move animation to the vertex shader using custom attributes (phase, speed, radius) and a time uniform. This avoids any CPU matrix updates:
// Vertex shader (GLSL 300 es) uniform
float uTime; attribute float aPhase; attribute float aRadius; void
main() { float angle = uTime * 0.5 +
aPhase; vec3 offset = vec3(aRadius * cos(angle),
0.0, aRadius * sin(angle)); vec4 worldPos =
modelMatrix * vec4(position + offset, 1.0);
gl_Position = projectionMatrix * viewMatrix * worldPos; }
8. Benchmark: 10K → 100K → 1M
Tested on a mid-range desktop (RTX 3060, Ryzen 5600X) with a simple sphere geometry (32 segments) and MeshStandardMaterial, at 1080p:
| Instance count | Draw calls | Frame time (ms) | FPS |
|---|---|---|---|
| 10 000 (individual Mesh) | 10 000 | 42 ms | ~24 |
| 10 000 (InstancedMesh) | 1 | 2.1 ms | >400 |
| 100 000 (InstancedMesh) | 1 | 6.8 ms | ~147 |
| 500 000 (InstancedMesh) | 1 | 12.4 ms | ~80 |
| 1 000 000 (InstancedMesh, low-poly) | 1 | 15.2 ms | ~66 |
Optimisation Checklist
- Reduce geometry complexity: 6-triangle icosahedron instead of 128-triangle sphere. Vertex count × instance count = total GPU vertices.
- Use MeshBasicMaterial if you don't need lighting — 2× faster than MeshStandardMaterial.
- LOD instancing: use different InstancedMeshes for near/mid/far — low-poly geometry for distance.
- Avoid needsUpdate every frame if instances are static — upload once at init.
- Switch to instanceColor instead of per-instance material if you only need colour variation.
- Consider WASM / compute shader for CPU-side position updates at >500K instances.
Total vertices = instance_count × geometry_vertices
Safe budget: < 20M filled vertices @ 60 FPS on mid-range GPU
1M instances × 12 tris (icosahedron) = 36M vertices → needs LOD or simple geo
1M instances × 2 tris (billboard quad) = 6M vertices → fine @ 60 FPS