Floating-Point Traps in Simulation Code
Simulation bugs caused by floating-point are insidious: they produce wrong results silently, often only for edge-case inputs. This reference documents every category of IEEE 754 footgun — bit layout, special values, catastrophic cancellation, denormals and GPU precision — with concrete JavaScript/GLSL examples and defences.
1. IEEE 754 Bit Format
JavaScript uses double precision (64-bit) for all number operations. WebGL's GLSL default is single precision (32-bit). Both follow IEEE 754:
value = (−1)^s × 2^(e−1023) × 1.mantissa
Range: ±5×10⁻³²⁴ … ±1.8×10³⁰⁸
Precision: ~15–17 significant decimal digits
32-bit float: [s:1][exponent:8][mantissa:23]
value = (−1)^s × 2^(e−127) × 1.mantissa
Range: ±1.2×10⁻³⁸ … ±3.4×10³⁸
Precision: ~6–9 significant decimal digits
The leading 1. in the mantissa is implicit (not stored) —
this is the hidden bit. Denormals (see §5) are the exception:
exponent = 0 means the leading bit is 0, not 1.
2. NaN, Infinity, and −0
IEEE 754 defines several special values that propagate through arithmetic in surprising ways:
| Value | How produced | Key behaviour |
|---|---|---|
| +Infinity | 1/0, overflow |
Propagates: Inf + 1 = Inf.
Inf - Inf = NaN
|
| -Infinity | -1/0, negative overflow |
-Inf < x for any finite x |
| NaN |
0/0, Inf-Inf, sqrt(-1)
|
Not equal to itself: NaN !== NaN. Poisons all
arithmetic.
|
| -0 | -1 * 0, (-small) underflow |
-0 === 0 is true. 1/-0 = -Infinity.
|
// NaN check — the only correct way if (Number.isNaN(x)) { /* ... */ } // correct if (x !== x) { /* ... */ } // also correct (NaN property) if (x === NaN) { /* ... */ } // WRONG — always false // Distinguish -0 from +0 const isNegZero = (x) => x === 0 && (1/x) === -Infinity;
3. Precision and Machine Epsilon
Machine epsilon (ε) is the smallest number such that
1 + ε ≠ 1:
| Format | ε (machine epsilon) | JS constant |
|---|---|---|
| float64 | 2.22 × 10⁻¹⁶ | Number.EPSILON (≈ 2.22e-16) |
| float32 | 1.19 × 10⁻⁷ | GLSL: compare to 1e-6 |
| float16 | 9.77 × 10⁻⁴ | GPU mediump, ~0.001 |
Precision errors accumulate. After N additions, the expected relative error is O(N·ε). In a time-stepping simulation with 60 fps × 3600 s = 216 000 steps, float32 errors reach ~2.5% — enough to visibly drift orbits.
4. Catastrophic Cancellation
When two nearly equal large numbers are subtracted, significant digits cancel and the result has much less precision than the inputs:
// Example: distance between nearly-overlapping particles let x1 = 1000000.0001, x2 = 1000000.0002; let dx = x2 - x1; // = 0.0001, but stored in float32 as ~0.000099182 // Error: ~0.8% from cancellation alone
Fix: Recentre coordinates around the region of interest. For planetary simulations, store positions relative to the camera or a local origin rather than in absolute world space:
// World-space to camera-relative (float-origin trick) const camX = camera.position.x; const relX = particle.x - camX; // small number, no cancellation
The same issue arises in quadratic formulas. Use the numerically stable form:
// UNSTABLE for |b| >> |c/b| x = (-b + Math.sqrt(b*b - 4*a*c)) / (2*a); // STABLE: avoid cancellation when b and sqrt discriminant nearly cancel const q = -0.5 * (b + Math.sign(b) * Math.sqrt(b*b - 4*a*c)); x1 = q / a; x2 = c / q;
// Kahan compensated summation function kahanSum(arr) { let sum = 0, c = 0; for (const x of arr) { const y = x - c; const t = sum + y; c = (t - sum) - y; sum = t; } return sum; }
5. Denormals and Underflow
When a float32 value falls below ~1.2×10⁻³⁸, the hardware activates denormal (subnormal) mode: the leading bit drops from 1 to 0, allowing gradual underflow at the expense of precision. Denormal arithmetic can be 50–100× slower on CPU (and is often disabled to zero on GPU).
// Denormal flush example — particle mass contribution negligible // but still triggers 100x slower denormal path each frame particle.vx *= dampingFactor; // if vx = 1e-39 (float32) → denormal // Fix: zero-out tiny velocities const MIN_V = 1e-30; if (Math.abs(particle.vx) < MIN_V) particle.vx = 0;
length(v) + 1e-6 rather than
length(v) before dividing.
6. Order Dependence and Non-Associativity
Floating-point addition is not associative:
(a + b) + c ≠ a + (b + c) in general. This means:
- Parallel reduction sums produce different results depending on thread order — important for reproducible simulation replays.
-
Changing compiler optimisation flags (e.g.
-ffast-math) can produce different results by reordering floating-point ops. - JavaScript engines are free to use 80-bit x87 registers internally, producing results that differ from true float64 in non-deterministic ways (rare but real on x86).
Float64Array (which forces 64-bit storage), and avoid
Math.random() — use a seeded pseudo-random generator like
mulberry32.
7. Comparing Float Values
Never compare floats with === unless you know both values
were set to the same literal or came from the same computation without
intermediate rounding:
// WRONG: almost never true due to rounding if (a + b === c) { ... } // Absolute epsilon — good for values near zero (e.g. velocities) const nearZero = (x, eps = 1e-9) => Math.abs(x) < eps; // Relative epsilon — good for values far from zero (e.g. energy) const nearEqual = (a, b, eps = 1e-9) => Math.abs(a - b) <= eps * Math.max(Math.abs(a), Math.abs(b), 1); // ULPs comparison — the most rigorous (requires DataView) function ulpDiff(a, b) { const buf = new ArrayBuffer(8); const view = new DataView(buf); view.setFloat64(0, a); const ia = view.getBigInt64(0); view.setFloat64(0, b); const ib = view.getBigInt64(0); return Number(ia > ib ? ia - ib : ib - ia); }
< 1e-9) for quantities that should be zero. Use relative epsilon for
quantities with magnitudes far from zero. Never hard-code
0.0001 as epsilon without checking the magnitude of your
values.
8. GPU Precision (GLSL mediump / highp)
WebGL 1.0 requires GLSL mediump (float16 or float32
depending on driver) for fragment shaders. WebGL 2.0 defaults to
float32 for both vertex and fragment. Always declare precision
explicitly:
// At top of each GLSL shader precision highp float; // recommended for simulations precision highp int; // Avoid implicit mediump (default in WebGL 1 fragment shaders)
Common GPU precision traps:
- World-space positions in float32: On a planet-scale terrain, player coordinates at (100 000, 50 000) lose ~1 cm precision with float32. Use double on CPU and send camera-relative positions to the GPU.
-
Time as uniform:
float uTimewill lose sub-millisecond precision after ~16 seconds. Pass as two floats (hi + lo), or reset to 0 periodically. -
Trigonometric functions:
sin(x)in GLSL is undefined for |x| > 2π on some drivers. Wrap withmod(x, 6.283185307). - Integer atomics in WebGPU: WebGPU compute shaders support i32/u32 atomics but not float atomics. Use integer accumulators scaled by a factor.
✅
precision highp float; at top of each shader✅ Camera-relative positions (pass
uCamPos, compute
worldPos - uCamPos on CPU)✅
max(length(v), EPSILON) before dividing by length✅ Wrap time uniform periodically or use two-float encoding
✅ Clamp normalised values:
clamp(n, 0.0, 1.0)