💡 Reference · Numerical Computing
📅 March 2026⏱ ~14 min read🔴 Practitioner

Floating-Point Traps in Simulation Code

Simulation bugs caused by floating-point are insidious: they produce wrong results silently, often only for edge-case inputs. This reference documents every category of IEEE 754 footgun — bit layout, special values, catastrophic cancellation, denormals and GPU precision — with concrete JavaScript/GLSL examples and defences.

1. IEEE 754 Bit Format

JavaScript uses double precision (64-bit) for all number operations. WebGL's GLSL default is single precision (32-bit). Both follow IEEE 754:

64-bit double: [s:1][exponent:11][mantissa:52]
  value = (−1)^s × 2^(e−1023) × 1.mantissa
  Range: ±5×10⁻³²⁴ … ±1.8×10³⁰⁸
  Precision: ~15–17 significant decimal digits

32-bit float: [s:1][exponent:8][mantissa:23]
  value = (−1)^s × 2^(e−127) × 1.mantissa
  Range: ±1.2×10⁻³⁸ … ±3.4×10³⁸
  Precision: ~6–9 significant decimal digits

The leading 1. in the mantissa is implicit (not stored) — this is the hidden bit. Denormals (see §5) are the exception: exponent = 0 means the leading bit is 0, not 1.

2. NaN, Infinity, and −0

IEEE 754 defines several special values that propagate through arithmetic in surprising ways:

Value How produced Key behaviour
+Infinity 1/0, overflow Propagates: Inf + 1 = Inf. Inf - Inf = NaN
-Infinity -1/0, negative overflow -Inf < x for any finite x
NaN 0/0, Inf-Inf, sqrt(-1) Not equal to itself: NaN !== NaN. Poisons all arithmetic.
-0 -1 * 0, (-small) underflow -0 === 0 is true. 1/-0 = -Infinity.
// NaN check — the only correct way
if (Number.isNaN(x)) { /* ... */ }   // correct
if (x !== x) { /* ... */ }             // also correct (NaN property)
if (x === NaN) { /* ... */ }           // WRONG — always false

// Distinguish -0 from +0
const isNegZero = (x) => x === 0 && (1/x) === -Infinity;
Simulation trap: A single NaN in a particle's velocity will silently propagate to every particle's position and then to every force calculation — the entire simulation becomes NaN within a few frames, rendering just a black screen or frozen particles. Always sanitise initial conditions and check for NaN in the main loop during development.

3. Precision and Machine Epsilon

Machine epsilon (ε) is the smallest number such that 1 + ε ≠ 1:

Format ε (machine epsilon) JS constant
float64 2.22 × 10⁻¹⁶ Number.EPSILON (≈ 2.22e-16)
float32 1.19 × 10⁻⁷ GLSL: compare to 1e-6
float16 9.77 × 10⁻⁴ GPU mediump, ~0.001

Precision errors accumulate. After N additions, the expected relative error is O(N·ε). In a time-stepping simulation with 60 fps × 3600 s = 216 000 steps, float32 errors reach ~2.5% — enough to visibly drift orbits.

4. Catastrophic Cancellation

When two nearly equal large numbers are subtracted, significant digits cancel and the result has much less precision than the inputs:

// Example: distance between nearly-overlapping particles
let x1 = 1000000.0001, x2 = 1000000.0002;
let dx = x2 - x1;   // = 0.0001, but stored in float32 as ~0.000099182
                     // Error: ~0.8% from cancellation alone

Fix: Recentre coordinates around the region of interest. For planetary simulations, store positions relative to the camera or a local origin rather than in absolute world space:

// World-space to camera-relative (float-origin trick)
const camX = camera.position.x;
const relX = particle.x - camX;  // small number, no cancellation

The same issue arises in quadratic formulas. Use the numerically stable form:

// UNSTABLE for |b| >> |c/b|
x = (-b + Math.sqrt(b*b - 4*a*c)) / (2*a);

// STABLE: avoid cancellation when b and sqrt discriminant nearly cancel
const q = -0.5 * (b + Math.sign(b) * Math.sqrt(b*b - 4*a*c));
x1 = q / a;
x2 = c / q;
Kahan summation: For summing large arrays (e.g. total energy), compensated summation recovers nearly full precision at the cost of 2× more operations.
// Kahan compensated summation
function kahanSum(arr) {
  let sum = 0, c = 0;
  for (const x of arr) {
    const y = x - c;
    const t = sum + y;
    c = (t - sum) - y;
    sum = t;
  }
  return sum;
}

5. Denormals and Underflow

When a float32 value falls below ~1.2×10⁻³⁸, the hardware activates denormal (subnormal) mode: the leading bit drops from 1 to 0, allowing gradual underflow at the expense of precision. Denormal arithmetic can be 50–100× slower on CPU (and is often disabled to zero on GPU).

// Denormal flush example — particle mass contribution negligible
// but still triggers 100x slower denormal path each frame
particle.vx *= dampingFactor;   // if vx = 1e-39 (float32) → denormal

// Fix: zero-out tiny velocities
const MIN_V = 1e-30;
if (Math.abs(particle.vx) < MIN_V) particle.vx = 0;
WebGL: GLSL shaders often flush denormals to zero (FTZ mode). A value that decreases smoothly in JS may jump to exactly 0 in GLSL, causing division-by-zero artefacts. Always add a minimum value: length(v) + 1e-6 rather than length(v) before dividing.

6. Order Dependence and Non-Associativity

Floating-point addition is not associative: (a + b) + c ≠ a + (b + c) in general. This means:

Simulation determinism: For reproducible replays, ensure all particle arrays are processed in fixed order, use Float64Array (which forces 64-bit storage), and avoid Math.random() — use a seeded pseudo-random generator like mulberry32.

7. Comparing Float Values

Never compare floats with === unless you know both values were set to the same literal or came from the same computation without intermediate rounding:

// WRONG: almost never true due to rounding
if (a + b === c) { ... }

// Absolute epsilon — good for values near zero (e.g. velocities)
const nearZero = (x, eps = 1e-9) => Math.abs(x) < eps;

// Relative epsilon — good for values far from zero (e.g. energy)
const nearEqual = (a, b, eps = 1e-9) =>
  Math.abs(a - b) <= eps * Math.max(Math.abs(a), Math.abs(b), 1);

// ULPs comparison — the most rigorous (requires DataView)
function ulpDiff(a, b) {
  const buf = new ArrayBuffer(8);
  const view = new DataView(buf);
  view.setFloat64(0, a); const ia = view.getBigInt64(0);
  view.setFloat64(0, b); const ib = view.getBigInt64(0);
  return Number(ia > ib ? ia - ib : ib - ia);
}
Rule of thumb: Use absolute epsilon (< 1e-9) for quantities that should be zero. Use relative epsilon for quantities with magnitudes far from zero. Never hard-code 0.0001 as epsilon without checking the magnitude of your values.

8. GPU Precision (GLSL mediump / highp)

WebGL 1.0 requires GLSL mediump (float16 or float32 depending on driver) for fragment shaders. WebGL 2.0 defaults to float32 for both vertex and fragment. Always declare precision explicitly:

// At top of each GLSL shader
precision highp float;   // recommended for simulations
precision highp int;

// Avoid implicit mediump (default in WebGL 1 fragment shaders)

Common GPU precision traps:

Quick checklist for GLSL shaders:
precision highp float; at top of each shader
✅ Camera-relative positions (pass uCamPos, compute worldPos - uCamPos on CPU)
max(length(v), EPSILON) before dividing by length
✅ Wrap time uniform periodically or use two-float encoding
✅ Clamp normalised values: clamp(n, 0.0, 1.0)