Numerical Differentiation & Integration: Finite Differences to Gauss Quadrature
Every physics simulator ultimately boils down to evaluating derivatives and integrals numerically. Choosing the wrong method — or the wrong step size — can mean errors that grow exponentially or results dominated by floating-point noise. This article develops the theory from first principles: finite-difference stencils, Richardson extrapolation, Newton-Cotes rules, Gaussian quadrature, and adaptive integration with error control.
1. Truncation Error vs Round-Off Error
When approximating a derivative or integral numerically two independent error sources compete:
- Truncation error: the error from replacing the true mathematical operation (limit as h→0) with a finite approximation using step size h. Decreasing h reduces this error — typically as a power of h.
- Round-off error: the error from using finite-precision floating-point arithmetic. Decreasing h amplifies round-off because it requires subtracting two nearly equal numbers (catastrophic cancellation).
The optimal step size balances these two: too large and truncation dominates; too small and round-off dominates. For IEEE 754 double precision (ε_mach ≈ 2.2 × 10⁻¹⁶), the optimal h for a first-derivative central difference is approximately:
Beyond h_opt the error floor is set by round-off and cannot be reduced further by decreasing h. This is a fundamental limitation — to get better derivatives you need either higher-precision arithmetic or a mathematically different approach (automatic differentiation, complex-step).
2. Finite Difference Formulas
All finite difference formulas are derived from Taylor series. Expanding f(x + h) around x:
First Derivative Formulas
| Formula | Expression | Error |
|---|---|---|
| Forward difference | (f(x+h) − f(x)) / h | O(h) |
| Backward difference | (f(x) − f(x−h)) / h | O(h) |
| Central difference | (f(x+h) − f(x−h)) / (2h) | O(h²) |
| 5-point stencil | (−f(x+2h) + 8f(x+h) − 8f(x−h) + f(x−2h)) / (12h) | O(h⁴) |
The central difference achieves second-order accuracy because the leading error term (from h²·f'') cancels when subtracting f(x+h) − f(x−h). This is why central differences are preferred whenever function evaluations at both sides of x are available.
Second Derivative
The second derivative formula uses three points and has O(h²) error. Note that the error coefficient is 1/12 × f⁽⁴⁾, so functions with large fourth derivatives (like sin(100x)) suffer more. This formula is ubiquitous in physics: it implements the discrete 1D Laplacian and is the backbone of finite difference PDE solvers.
Mixed Partial Derivatives
This 4-point cross stencil has O(h²) error and is used in computing Hessian matrices for optimisation and in elasticity solvers.
3. Richardson Extrapolation
Richardson extrapolation is a powerful technique to increase the order of accuracy of any finite difference approximation without deriving a new formula. The idea: if the error of approximation D(h) is known to be of the form
then by evaluating at h and h/2 we can eliminate the leading h² term:
This has effectively promoted the O(h²) central difference to O(h⁴) — the same as the 5-point stencil — using only two cheaper evaluations. The process can be repeated recursively (Romberg integration for integrals, or iterated Richardson for derivatives) to achieve arbitrarily high order.
Romberg Integration
Romberg integration applies Richardson extrapolation to the composite trapezoidal rule:
In practice, Romberg integration converges so fast that even R[4][4] (requiring only 17 function evaluations) achieves double-precision accuracy for most smooth functions. It is the go-to method for one-dimensional integrals over closed intervals.
4. Higher-Order & Complex-Step Differentiation
Higher-Order Stencils via Undetermined Coefficients
Any finite difference stencil can be derived systematically. To find a k-th derivative with n points, write:
The coefficients cᵢ are found by enforcing that the formula is exact
for polynomials 1, x, x², ..., x^(n−1) (i.e., solving a Vandermonde
system). The website
findiff
automates this. For production code, scipy's
findiff.coefficients() or numpy's
numpy.gradient() implement this reliably.
Complex-Step Differentiation
The complex-step trick is arguably the most under-appreciated numerical differentiation technique:
The crucial advantage: because we take an imaginary part, there is no subtraction of nearly equal numbers. The round-off error is purely from evaluating f at a complex point, not from cancellation. We can use h = 10⁻²⁰ and get near-exact first derivatives limited only by the function's own evaluation accuracy. The only requirement: the function f must be implementable over complex numbers (easily done in Python, Julia, C++ with std::complex).
5. Newton-Cotes Integration Rules
Newton-Cotes rules integrate by fitting a polynomial through equally-spaced points and integrating the polynomial exactly.
| Rule | Points | Formula | Error |
|---|---|---|---|
| Midpoint | 1 | h · f((a+b)/2) | O(h³) |
| Trapezoid | 2 | (h/2)[f(a)+f(b)] | O(h³) |
| Simpson's 1/3 | 3 | (h/6)[f(a)+4f(m)+f(b)] | O(h⁵) |
| Simpson's 3/8 | 4 | (3h/8)[f(a)+3f(m₁)+3f(m₂)+f(b)] | O(h⁵) |
| Boole's rule | 5 | (2h/45)[7f₀+32f₁+12f₂+32f₃+7f₄] | O(h⁷) |
Composite Rules
A single Simpson's rule over [a, b] uses 3 points. The composite Simpson's rule divides [a, b] into n (even) subintervals of width h = (b−a)/n and applies Simpson per pair:
Why Not Higher-Order Newton-Cotes?
For n ≥ 8, Newton-Cotes weights include negative values — meaning the formula subtracts positive function values, leading to catastrophic cancellation and poor numerical properties. This is Runge's phenomenon for integration. Instead, for high accuracy one should use Gaussian quadrature (which cleverly places nodes at non-uniform positions to avoid this) or composite Simpson/trapezoid with Richardson extrapolation (Romberg).
6. Gaussian Quadrature
While Newton-Cotes fixes node positions and finds weights, Gaussian quadrature co-optimises both. An n-point Gaussian rule is exact for polynomials of degree up to 2n − 1 — twice the exactness of an n-point Newton-Cotes rule. All weights are positive (avoids cancellation), and the nodes are roots of orthogonal polynomials.
Gauss-Legendre Quadrature
For integrals over [−1, 1], the Gauss-Legendre rule uses nodes xᵢ = roots of the Legendre polynomial Pₙ(x) and weights:
| n | Nodes xᵢ | Weights wᵢ | Exact for poly degree |
|---|---|---|---|
| 2 | ±0.577350 | 1.000000 | ≤ 3 |
| 3 | 0, ±0.774597 | 0.888889, 0.555556 | ≤ 5 |
| 5 | 0, ±0.538469, ±0.906180 | 0.568889, 0.478629, 0.236927 | ≤ 9 |
To integrate over [a, b] instead of [−1, 1], apply the change of variables x = (b−a)/2 · t + (a+b)/2:
Gauss-Kronrod Rules (G7K15)
Many adaptive integrators (QUADPACK, scipy.integrate.quad) use the 15-point Gauss-Kronrod rule. It embeds the 7-point Gauss-Legendre rule as a subset: the 7 existing nodes are reused and 8 new nodes added. This gives both a high-accuracy result (15 points) and an error estimate (|G15 − G7|) at no extra function evaluations compared to evaluating G15 from scratch. The estimated error drives adaptive subdivision.
7. Adaptive Integration & Error Control
Adaptive algorithms automatically concentrate function evaluations where the integrand varies most, dramatically improving efficiency for non-smooth or sharply-peaked functions.
Adaptive Simpson's Rule
The simplest adaptive scheme estimates the error on [a, b] by comparing the whole-interval Simpson result with the sum of two half-interval results:
If the error estimate < tolerance × (b−a) / (b−a_total), recurse on each half; otherwise, accept S₁ + S₂ as the local result. This process terminates in O(n log n) function evaluations for Lipschitz-continuous integrands.
JavaScript Implementation
// Adaptive Simpson's integration
function adaptiveSimpson(f, a, b, tol = 1e-9) {
function simpson(f, a, b) {
const m = (a + b) / 2;
return (b - a) / 6 * (f(a) + 4 * f(m) + f(b));
}
function recurse(a, b, tol, whole, fa, fm, fb, depth) {
const m = (a + b) / 2;
const lm = (a + m) / 2, rm = (m + b) / 2;
const flm = f(lm), frm = f(rm);
const left = (m - a) / 6 * (fa + 4 * flm + fm);
const right = (b - m) / 6 * (fm + 4 * frm + fb);
const delta = left + right - whole;
if (depth >= 50 || Math.abs(delta) <= 15 * tol)
return left + right + delta / 15;
return recurse(a, m, tol/2, left, fa, flm, fm, depth+1)
+ recurse(m, b, tol/2, right, fm, frm, fb, depth+1);
}
const fa = f(a), fb = f(b), fm = f((a+b)/2);
const whole = (b - a) / 6 * (fa + 4 * fm + fb);
return recurse(a, b, tol, whole, fa, fm, fb, 0);
}
// Usage examples:
console.log(adaptiveSimpson(Math.sin, 0, Math.PI)); // ≈ 2.0
console.log(adaptiveSimpson(x => 1/x, 1, 100)); // ≈ ln(100) ≈ 4.6052
Gauss-Kronrod Adaptive (scipy-style)
Production integrators like scipy.integrate.quad use the
G7K15 rule with a priority-queue based subdivision strategy
(largest-error subinterval subdivided first). They also handle:
- Infinite limits via substitution (e.g. x = 1/t for [1, ∞))
- Integrable singularities via Gauss-Jacobi or Gauss-Laguerre rules
- Oscillatory integrands via Fourier-integral methods (QUADPACK QAWO/QAWS)
8. Applications in Simulation
Gradient Computation in Physics Engines
Many physics engines need gradients of energy or potential functions. For simple analytical potentials (springs, gravity) exact derivatives are preferable. For complex or black-box potentials (neural network force fields, look-up-table potentials), finite differences or complex-step derivatives are used.
In molcular dynamics, the interatomic force is F = −∇U(r). The Lennard-Jones potential:
Numerical Integration in ODE Solvers
The connection between integration and ODE solving: Euler's method is just the left-endpoint rectangle rule applied to the integral form of an ODE:
RK4 is exactly Simpson's rule applied to the integrand f(y)! This equivalence explains why RK4 achieves fourth-order accuracy (same as Simpson) and why it needs 4 function evaluations per step.
Sensitivity Analysis
In optimisation-based simulation (e.g., inverse kinematics, physics-informed neural networks) we need derivatives of outputs with respect to parameters. For small parameter counts, finite differences work. For large parameter counts (neural networks with millions of weights), backpropagation (reverse-mode automatic differentiation) is required — finite differences would be O(n) times more expensive. Complex-step occupies a sweet spot: exact to machine precision, no AD infrastructure needed, works for moderate parameter counts (up to ~1000 parameters).
• First derivatives of smooth functions: central difference (h ≈ 1e−6) or complex-step (h ≈ 1e−20)
• High-order derivatives: Richardson extrapolation applied to central differences
• 1D integrals over finite intervals: adaptive Simpson or Romberg
• Multi-dimensional integrals: Gauss-Legendre tensor product (low dim) or Monte Carlo (high dim)
• Integrals with singularities: Gauss-Jacobi/Laguerre or QUADPACK