Numerical Differentiation & Integration: Finite Differences, Gauss Quadrature

1. Truncation Error vs Round-Off Error

When approximating a derivative or integral numerically two independent error sources compete:

Truncation error: the error from replacing the true mathematical operation (limit as h→0) with a finite approximation using step size h. Decreasing h reduces this error — typically as a power of h.
Round-off error: the error from using finite-precision floating-point arithmetic. Decreasing h amplifies round-off because it requires subtracting two nearly equal numbers (catastrophic cancellation).

The optimal step size balances these two: too large and truncation dominates; too small and round-off dominates. For IEEE 754 double precision (ε_mach ≈ 2.2 × 10⁻¹⁶), the optimal h for a first-derivative central difference is approximately:

h_opt \approx ε_mach^(1/3) \approx 6 \times 10^-6 (for central difference) h_opt \approx ε_mach^(1/2) \approx 1.5 \times 10^-8 (for forward difference)

Beyond h_opt the error floor is set by round-off and cannot be reduced further by decreasing h. This is a fundamental limitation — to get better derivatives you need either higher-precision arithmetic or a mathematically different approach (automatic differentiation, complex-step).

2. Finite Difference Formulas

All finite difference formulas are derived from Taylor series. Expanding f(x + h) around x:

f(x+h) = f(x) + h\cdotf'(x) + (h²/2)\cdotf''(x) + (h³/6)\cdotf'''(x) + ... f(x-h) = f(x) - h\cdotf'(x) + (h²/2)\cdotf''(x) - (h³/6)\cdotf'''(x) + ...

First Derivative Formulas

Formula	Expression	Error
Forward difference	(f(x+h) − f(x)) / h	O(h)
Backward difference	(f(x) − f(x−h)) / h	O(h)
Central difference	(f(x+h) − f(x−h)) / (2h)	O(h²)
5-point stencil	(−f(x+2h) + 8f(x+h) − 8f(x−h) + f(x−2h)) / (12h)	O(h⁴)

The central difference achieves second-order accuracy because the leading error term (from h²·f'') cancels when subtracting f(x+h) − f(x−h). This is why central differences are preferred whenever function evaluations at both sides of x are available.

Second Derivative

f''(x) \approx (f(x+h) - 2f(x) + f(x-h)) / h² // O(h²) error // Derivation: add the two Taylor series: // f(x+h) + f(x-h) = 2f(x) + h²\cdotf''(x) + (h⁴/12)\cdotf''''(x) + ... // Rearrange: // f''(x) = [f(x+h) - 2f(x) + f(x-h)] / h² - (h²/12)\cdotf''''(x) + ...

The second derivative formula uses three points and has O(h²) error. Note that the error coefficient is 1/12 × f⁽⁴⁾, so functions with large fourth derivatives (like sin(100x)) suffer more. This formula is ubiquitous in physics: it implements the discrete 1D Laplacian and is the backbone of finite difference PDE solvers.

Mixed Partial Derivatives

\partial²f/\partialx\partialy \approx [f(x+h,y+h) - f(x+h,y-h) - f(x-h,y+h) + f(x-h,y-h)] / (4h²)

This 4-point cross stencil has O(h²) error and is used in computing Hessian matrices for optimisation and in elasticity solvers.

3. Richardson Extrapolation

Richardson extrapolation is a powerful technique to increase the order of accuracy of any finite difference approximation without deriving a new formula. The idea: if the error of approximation D(h) is known to be of the form

D(h) = f'(x) + a₁h² + a₂h⁴ + ... (for a central difference, odd powers vanish)

then by evaluating at h and h/2 we can eliminate the leading h² term:

D(h) = f'(x) + a\cdoth² + O(h⁴) D(h/2) = f'(x) + a\cdoth²/4 + O(h⁴) // Eliminate a\cdoth² : R(h) = [4\cdotD(h/2) - D(h)] / 3 = f'(x) + O(h⁴)

This has effectively promoted the O(h²) central difference to O(h⁴) — the same as the 5-point stencil — using only two cheaper evaluations. The process can be repeated recursively (Romberg integration for integrals, or iterated Richardson for derivatives) to achieve arbitrarily high order.

Romberg Integration

Romberg integration applies Richardson extrapolation to the composite trapezoidal rule:

T(h) = h/2 \cdot [f(a) + 2Σf(aₖ) + f(b)] // composite trapezoid // Richardson table (each row halves h, each column extrapolates): R[i][0] = T(h/2^i) R[i][j] = [4^j \cdot R[i][j-1] - R[i-1][j-1]] / (4^j - 1) // R[n][n] converges rapidly (O(h^(2n+2)) error)

In practice, Romberg integration converges so fast that even R[4][4] (requiring only 17 function evaluations) achieves double-precision accuracy for most smooth functions. It is the go-to method for one-dimensional integrals over closed intervals.

4. Higher-Order & Complex-Step Differentiation

Higher-Order Stencils via Undetermined Coefficients

Any finite difference stencil can be derived systematically. To find a k-th derivative with n points, write:

f^(k)(x) \approx Σᵢ cᵢ \cdot f(x + iΔx)

The coefficients cᵢ are found by enforcing that the formula is exact for polynomials 1, x, x², ..., x^(n−1) (i.e., solving a Vandermonde system). The website findiff automates this. For production code, scipy's findiff.coefficients() or numpy's numpy.gradient() implement this reliably.

Complex-Step Differentiation

The complex-step trick is arguably the most under-appreciated numerical differentiation technique:

// Evaluate f at x + ih (imaginary step) f(x + ih) = f(x) + ih\cdotf'(x) - (h²/2)\cdotf''(x) - (ih³/6)\cdotf'''(x) + ... // Take imaginary part and divide by h: Im[f(x + ih)] / h = f'(x) - (h²/6)\cdotf'''(x) + ... // O(h²) — NO cancellation!

The crucial advantage: because we take an imaginary part, there is no subtraction of nearly equal numbers. The round-off error is purely from evaluating f at a complex point, not from cancellation. We can use h = 10⁻²⁰ and get near-exact first derivatives limited only by the function's own evaluation accuracy. The only requirement: the function f must be implementable over complex numbers (easily done in Python, Julia, C++ with std::complex).

Rule of thumb: Use complex-step (h ≈ 1e−20) for gradient computation in optimisation and sensitivity analysis. Use central differences (h ≈ 1e−6) for quick Jacobian approximations in Newton methods. Use Richardson extrapolation when you need high-order derivatives of a black-box function.

5. Newton-Cotes Integration Rules

Newton-Cotes rules integrate by fitting a polynomial through equally-spaced points and integrating the polynomial exactly.

Rule	Points	Formula	Error
Midpoint	1	h · f((a+b)/2)	O(h³)
Trapezoid	2	(h/2)[f(a)+f(b)]	O(h³)
Simpson's 1/3	3	(h/6)[f(a)+4f(m)+f(b)]	O(h⁵)
Simpson's 3/8	4	(3h/8)[f(a)+3f(m₁)+3f(m₂)+f(b)]	O(h⁵)
Boole's rule	5	(2h/45)[7f₀+32f₁+12f₂+32f₃+7f₄]	O(h⁷)

Composite Rules

A single Simpson's rule over [a, b] uses 3 points. The composite Simpson's rule divides [a, b] into n (even) subintervals of width h = (b−a)/n and applies Simpson per pair:

∫ₐᵇ f dx ≈ (h/3)[f(x₀) + 4f(x₁) + 2f(x₂) + 4f(x₃) + ... + 4f(x_{n-1}) + f(xₙ)] // Local error: O(h⁵) per subinterval // Global error: O(h⁴) over [a,b] (n subintervals × h⁵/h = h⁴)

Why Not Higher-Order Newton-Cotes?

For n ≥ 8, Newton-Cotes weights include negative values — meaning the formula subtracts positive function values, leading to catastrophic cancellation and poor numerical properties. This is Runge's phenomenon for integration. Instead, for high accuracy one should use Gaussian quadrature (which cleverly places nodes at non-uniform positions to avoid this) or composite Simpson/trapezoid with Richardson extrapolation (Romberg).

6. Gaussian Quadrature

While Newton-Cotes fixes node positions and finds weights, Gaussian quadrature co-optimises both. An n-point Gaussian rule is exact for polynomials of degree up to 2n − 1 — twice the exactness of an n-point Newton-Cotes rule. All weights are positive (avoids cancellation), and the nodes are roots of orthogonal polynomials.

Gauss-Legendre Quadrature

For integrals over [−1, 1], the Gauss-Legendre rule uses nodes xᵢ = roots of the Legendre polynomial Pₙ(x) and weights:

wᵢ = 2 / [(1 - xᵢ²) \cdot (Pₙ'(xᵢ))²] \int₋₁¹ f(x) dx \approx Σᵢ₌₁ⁿ wᵢ f(xᵢ)

n	Nodes xᵢ	Weights wᵢ	Exact for poly degree
2	±0.577350	1.000000	≤ 3
3	0, ±0.774597	0.888889, 0.555556	≤ 5
5	0, ±0.538469, ±0.906180	0.568889, 0.478629, 0.236927	≤ 9

To integrate over [a, b] instead of [−1, 1], apply the change of variables x = (b−a)/2 · t + (a+b)/2:

\intₐᵇ f(x) dx = (b-a)/2 \cdot Σᵢ wᵢ \cdot f((b-a)/2 \cdot xᵢ + (a+b)/2)

Gauss-Kronrod Rules (G7K15)

Many adaptive integrators (QUADPACK, scipy.integrate.quad) use the 15-point Gauss-Kronrod rule. It embeds the 7-point Gauss-Legendre rule as a subset: the 7 existing nodes are reused and 8 new nodes added. This gives both a high-accuracy result (15 points) and an error estimate (|G15 − G7|) at no extra function evaluations compared to evaluating G15 from scratch. The estimated error drives adaptive subdivision.

7. Adaptive Integration & Error Control

Adaptive algorithms automatically concentrate function evaluations where the integrand varies most, dramatically improving efficiency for non-smooth or sharply-peaked functions.

Adaptive Simpson's Rule

The simplest adaptive scheme estimates the error on [a, b] by comparing the whole-interval Simpson result with the sum of two half-interval results:

S(a, b) = Simpson on [a, b] (3 evaluations) S₁ + S₂ = Simpson on [a,m] + Simpson on [m,b] (5 evaluations, m=(a+b)/2) Error estimate \approx |S(a,b) - (S₁+S₂)| / 15 // from h⁴ error expansion

If the error estimate < tolerance × (b−a) / (b−a_total), recurse on each half; otherwise, accept S₁ + S₂ as the local result. This process terminates in O(n log n) function evaluations for Lipschitz-continuous integrands.

JavaScript Implementation

// Adaptive Simpson's integration
function adaptiveSimpson(f, a, b, tol = 1e-9) {
  function simpson(f, a, b) {
    const m = (a + b) / 2;
    return (b - a) / 6 * (f(a) + 4 * f(m) + f(b));
  }
  function recurse(a, b, tol, whole, fa, fm, fb, depth) {
    const m = (a + b) / 2;
    const lm = (a + m) / 2, rm = (m + b) / 2;
    const flm = f(lm), frm = f(rm);
    const left  = (m - a) / 6 * (fa + 4 * flm + fm);
    const right = (b - m) / 6 * (fm + 4 * frm + fb);
    const delta = left + right - whole;
    if (depth >= 50 || Math.abs(delta) <= 15 * tol)
      return left + right + delta / 15;
    return recurse(a, m, tol/2, left,  fa, flm, fm, depth+1)
         + recurse(m, b, tol/2, right, fm, frm, fb, depth+1);
  }
  const fa = f(a), fb = f(b), fm = f((a+b)/2);
  const whole = (b - a) / 6 * (fa + 4 * fm + fb);
  return recurse(a, b, tol, whole, fa, fm, fb, 0);
}

// Usage examples:
console.log(adaptiveSimpson(Math.sin, 0, Math.PI));  // ≈ 2.0
console.log(adaptiveSimpson(x => 1/x, 1, 100));      // ≈ ln(100) ≈ 4.6052

Gauss-Kronrod Adaptive (scipy-style)

Production integrators like scipy.integrate.quad use the G7K15 rule with a priority-queue based subdivision strategy (largest-error subinterval subdivided first). They also handle:

Infinite limits via substitution (e.g. x = 1/t for [1, ∞))
Integrable singularities via Gauss-Jacobi or Gauss-Laguerre rules
Oscillatory integrands via Fourier-integral methods (QUADPACK QAWO/QAWS)

8. Applications in Simulation

Gradient Computation in Physics Engines

Many physics engines need gradients of energy or potential functions. For simple analytical potentials (springs, gravity) exact derivatives are preferable. For complex or black-box potentials (neural network force fields, look-up-table potentials), finite differences or complex-step derivatives are used.

In molcular dynamics, the interatomic force is F = −∇U(r). The Lennard-Jones potential:

U(r) = 4ε[(σ/r)¹² - (σ/r)⁶] // Analytical derivative (preferred): dU/dr = 4ε[-12σ¹²/r¹³ + 6σ⁶/r⁷] // Numerical alternative (5-point stencil for validation): dU/dr \approx [-U(r+2h) + 8U(r+h) - 8U(r-h) + U(r-2h)] / (12h)

Numerical Integration in ODE Solvers

The connection between integration and ODE solving: Euler's method is just the left-endpoint rectangle rule applied to the integral form of an ODE:

y(t+h) = y(t) + \intₜᵗ⁺ʰ f(y(τ)) dτ // Euler (degree 1 accuracy): \approx y(t) + h \cdot f(y(t)) // Trapezoidal (degree 2): \approx y(t) + h/2 \cdot [f(y(t)) + f(y(t+h))] (implicit) // RK4 (degree 4): \approx y(t) + h/6 \cdot [k₁ + 2k₂ + 2k₃ + k₄] // where k₁=f(t,y), k₂=f(t+h/2,y+h/2\cdotk₁), etc.

RK4 is exactly Simpson's rule applied to the integrand f(y)! This equivalence explains why RK4 achieves fourth-order accuracy (same as Simpson) and why it needs 4 function evaluations per step.

Sensitivity Analysis

In optimisation-based simulation (e.g., inverse kinematics, physics-informed neural networks) we need derivatives of outputs with respect to parameters. For small parameter counts, finite differences work. For large parameter counts (neural networks with millions of weights), backpropagation (reverse-mode automatic differentiation) is required — finite differences would be O(n) times more expensive. Complex-step occupies a sweet spot: exact to machine precision, no AD infrastructure needed, works for moderate parameter counts (up to ~1000 parameters).

Summary of recommendations:
• First derivatives of smooth functions: central difference (h ≈ 1e−6) or complex-step (h ≈ 1e−20)
• High-order derivatives: Richardson extrapolation applied to central differences
• 1D integrals over finite intervals: adaptive Simpson or Romberg
• Multi-dimensional integrals: Gauss-Legendre tensor product (low dim) or Monte Carlo (high dim)
• Integrals with singularities: Gauss-Jacobi/Laguerre or QUADPACK

Numerical Differentiation & Integration: Finite Differences to Gauss Quadrature