Learning #30 – Statistical Field Theory: From Ising Models to Renormalisation and Deep Learning Connections

Statistical mechanics describes matter at equilibrium using a single object: the partition function Z = ∑ e−βH. Statistical field theory takes this framework to the continuum, where the sum over microstates becomes a functional integral over field configurations. The deep machinery that emerges — the renormalisation group, fixed points, and universality — explains why systems as different as magnets, fluids, and neural networks can share identical critical behaviour.

🎉 Milestone: This is the 30th post in the Learning series. Each Learning post develops the mathematics and physics behind the simulations on this platform.

The central observation of statistical field theory is that the critical behaviour of many-body systems is governed not by microscopic details, but by symmetry and dimensionality. A magnet near its Curie point, a liquid near the liquid-gas critical point, and a percolating network all share the same set of critical exponents because they belong to the same universality class. The tool that makes this precise — the renormalisation group — is one of the most beautiful constructs in theoretical physics, and it now appears in unexpected form inside modern machine learning architectures.

1. From Statistical Mechanics to Field Theory

The Ising model is the canonical starting point. Originally proposed by Lenz (1920) and solved by Ising (1925) in 1D (no transition) and by Onsager (1944) in 2D (exact solution), the model distils the competition between ferromagnetic ordering and thermal disorder into the simplest possible Hamiltonian.

Ising Model, Landau-Ginzburg & Spontaneous Symmetry Breaking

Ising Hamiltonian:
  H = −J Σ_{} σᵢσⱼ − h Σᵢ σᵢ
  σᵢ ∈ {−1, +1}; J > 0 (ferromagnetic); h = external field
  Z = Σ_{σ} exp(−βH) with β = 1/(k_B T)

Mean-field approximation:
  Replace σⱼ → ⟨m⟩ (mean magnetisation), each site has z neighbours:
  H_eff = −(zJ⟨m⟩ + h) Σᵢ σᵢ
  Self-consistency: ⟨m⟩ = tanh[β(zJ⟨m⟩ + h)]
  Critical temperature: k_B T_c = zJ
  Near T_c: ⟨m⟩ ≈ ±√(3(T_c−T)/T_c)   (h = 0, T < T_c)
  Below T_c: Z₂ symmetry (σ → −σ) spontaneously broken

Landau-Ginzburg (LG) free energy functional:
  F[φ] = ∫d^d x [½(∇φ)² + a(T)φ² + b φ⁴ + ...]
  φ(x): continuous order parameter field (continuum limit of ⟨σᵢ⟩)
  a(T) = a₀(T − T_c): changes sign at T_c
  Saddle-point (mean-field): δF/δφ = 0 → −∇²φ + 2aφ + 4bφ³ = 0
  Uniform solution: φ = 0 (T > T_c); φ = ±√(−a/2b) (T < T_c)
  Mexican hat potential at T < T_c → degenerate minima

Beyond mean-field — Gaussian fluctuations:
  F[φ] = ∫(d^d k)/(2π)^d [½k²|φ_k|² + a|φ_k|²]   (Fourier, quadratic part)
  Propagator: G(k) = ⟨φ_k φ_{-k}⟩ = 1/(k² + 2a)
  Correlation length: ξ = (2a)^{−½} ∝ |T − T_c|^{−½}  (mean-field: ν = ½)
          

2. Path Integrals in Field Theory

Feynman's path integral formulation replaces the sum over discrete microstates with an integral over all configurations of a field. For a scalar field φ(x) in d-dimensional Euclidean space, the partition function becomes Z = ∫Dφ exp(−S[φ]), where S is the Euclidean action. This formalism unifies quantum field theory and statistical mechanics through an analytic continuation: imaginary time τ = it maps quantum amplitudes to statistical Boltzmann weights.

Euclidean Path Integral & Quantum-Statistical Correspondence

Euclidean partition function:
  Z = ∫Dφ exp(−S_E[φ]/ħ)
  S_E[φ] = ∫d^d x [½(∂_μ φ)² + V(φ)]   (positive definite → well-posed)
  μ = 1..d; sum over Euclidean indices (no minus signs from Minkowski metric)

Quantum mechanics in imaginary time:
  Minkowski path integral: ⟨x_f|e^{−iHt/ħ}|x_i⟩ = ∫Dx exp(iS_M/ħ)
  Wick rotation t → −iτ:
    e^{−iHt/ħ} → e^{−Hτ/ħ}   (Boltzmann weight!)
  Thermal partition function Z = Tr(e^{−βH}) = ∫Dx [periodic in τ ∈ (0,ħβ)]
  Correspondence: inverse temperature β ↔ Euclidean time extent ħβ

Gaussian path integral (free field, V = ½m²φ²):
  Z_0 = ∫Dφ exp(−½∫φ(−∇² + m²)φ)
  = (det(−∇² + m²))^{−½}
  = exp(−½ Tr ln(−∇² + m²))
  Evaluated exactly: Z_0 = exp(−½ Σ_k ln(k² + m²))
  Free energy: F_0 = ½ Σ_k ln(k² + m²) = ½ (L/2π)^d ∫d^d k ln(k² + m²)

Perturbation theory (φ⁴ theory):
  S[φ] = ∫d^d x [½(∂φ)² + ½m²φ² + (λ/4!)φ⁴]
  Expand exp(−Sₙₙ[φ]) in powers of λ → Feynman diagrams
  4D φ⁴: renormalisable; counterterms δm², δZ, δλ absorb UV divergences
  Upper critical dimension d_c = 4 (above d_c: mean-field exponents are exact)
          

3. The Renormalisation Group

Wilson’s renormalisation group (RG) is a framework for understanding how physical descriptions change with the scale at which we observe a system. The key insight is that near a critical point, the physics becomes scale-invariant — the correlation length diverges, and the system looks the same on all length scales. Fixed points of the RG flow correspond to such scale-invariant theories.

Wilson RG, Fixed Points & Scaling Operators

Kadanoff block-spin construction (d=2 Ising):
  Divide lattice into blocks of L sites; replace block by single effective spin
  Effective Hamiltonian H'(σ') at scale bL → repeat → RG flow in coupling space

Momentum-shell RG (Wilson 1971):
  Mode decomposition: φ(k) = φ_< (|k|<Λ/b) + φ_> (Λ/b<|k|<Λ)
  Step 1 — integrate out fast modes φ_>: ∫Dφ_> e^{−S[φ_<,φ_>]}
  Step 2 — rescale: k → bk, φ_< → b^{d/2−1+η/2} φ  (field rescaling)
  Result: new effective action S'[φ'_<] with shifted couplings

RG flow equations (φ⁴ theory near d=4):
  da/dl = 2a + c₁ λ   (l = ln b)
  dλ/dl = (4−d)λ − c₂ λ²
  Wilson-Fisher fixed point: λ* = (4−d)/c₂ + O(ε²)  (ε = 4−d)
  d=3: ε=1 → Wilson-Fisher lies between Gaussian (λ=0) and non-trivial

Fixed points and scaling:
  Gaussian: a*=0, λ*=0   (free field, mean-field exponents exact for d>4)
  Wilson-Fisher: controls 3D Ising critical point
  At fixed point H*: scaling operators O_i with dimensions Δᵢ
    Relevant: Δᵢ < d → grows under RG → leaves fixed point
    Irrelevant: Δᵢ > d → flows to zero → "wash out" microscopic details → Universality
    Marginal: Δᵢ = d → flow determined by higher-order terms

Operators at 3D Ising Wilson-Fisher fixed point:
  φ² (temperature deformation): relevant, Δ = 1/ν⁻¹ = 1/0.629 ≈ 1.587
  φ   (field deformation): relevant, Δ = 2 − β/ν ≈ 1.518
  All Z₂-even operators with higher dimensions: irrelevant → universality
          

4. Critical Exponents and Universality Classes

Universality is the empirical observation that systems with different microscopic structure exhibit identical power-law behaviour as they approach a critical point. The set of critical exponents (α, β, γ, δ, ν, η) characterises a universality class, which depends only on the dimensionality of space and the symmetry of the order parameter.

Critical Exponents, Scaling Relations & Universality Classes

Definitions (t = (T−T_c)/T_c, h = external field):
  ξ ~ |t|^{−ν}                (correlation length)
  C_h ~ |t|^{−α}              (specific heat)
  ⟨m⟩ ~ |t|^β   (t < 0)       (order parameter)
  χ = ∂m/∂h ~ |t|^{−γ}        (susceptibility)
  ⟨m(h)⟩ ~ h^{1/δ} (t = 0)    (equation of state)
  G(r) ~ r^{−(d−2+η)} exp(−r/ξ) (correlation function)

Scaling relations (follow from single diverging length scale ξ):
  α + 2β + γ = 2            (Rushbrooke, follows from scaling)
  γ = ν(2−η)                (Fisher; ν and η are independent)
  dν = 2−α                  (hyperscaling; holds for d ≤ d_c)
  δ = (d+2−η)/(d−2+η)      (Widom + Fisher)
  Only 2 independent exponents needed to determine all 6

Exponent values:
  Universality class  d   (ν,    β,   γ,   α,   η  )
  Mean-field         any (0.50, 0.50,1.00, 0,   0  )  [d > 4]
  2D Ising            2  (1.00, 0.125,1.75,0,  0.25)  [exact]
  3D Ising            3  (0.629,0.326,1.237,0.110,0.036) [conformal bootstrap]
  3D Heisenberg       3  (0.711,0.366,1.397,−0.133,0.035)
  3D XY (λ-point)     3  (0.671,0.348,1.316,−0.014,0.038)

Examples of equivalent critical systems:
  3D Ising class: uniaxial magnets, liquid-gas critical point, binary alloys
  3D XY class:    superfluid He⁴ (λ-point at 2.17 K), superconductors
  Percolation:    own class; ν=0.876, β=0.417 (3D)

2D exact solution (Onsager 1944):
  k_B T_c = 2J/ln(1+√2) ≈ 2.269 J
  C ∝ −ln|t| (logarithmic divergence → α = 0 exactly)
  ⟨m⟩ = (1 − sinh^{−4}(2βJ))^{1/8} → β = 1/8
          

5. Conformal Field Theory in Two Dimensions

At a critical point, rotational and translational symmetry enhance to the full conformal group, which includes local angle-preserving transformations. In two dimensions the conformal group is infinite-dimensional, making 2D CFT a nearly exactly solvable theory. This power was exploited by Belavin, Polyakov, and Zamolodchikov (BPZ) in 1984 to classify 2D critical models by their central charge c and operator content.

Conformal Group, Virasoro Algebra & Minimal Models

Conformal transformations (in R^d):
  Preserve angles: g_μν(x) → Ω(x) g_μν(x)
  d ≥ 3: finite-dimensional group SO(d+1,1) with (d+2)(d+1)/2 generators
  d = 2: local conformal maps = analytic functions z → f(z) on ℂ
         Infinite-dimensional; generators L_n, n ∈ ℤ

Virasoro algebra (2D CFT):
  [L_m, L_n] = (m−n)L_{m+n} + (c/12) m(m²−1) δ_{m+n,0}
  c = central charge: characterises the CFT
  L_{−1}, L_0, L_1: generate global SL(2,ℂ) subgroup
  L_0: dilation generator → eigenvalue h = conformal weight of operator
  Full Virasoro highest-weight state |h⟩: L_0|h⟩ = h|h⟩, L_n|h⟩ = 0 (n > 0)

Primary operators and OPE:
  T(z)O(w,w̄) = (h/(z−w)²)O(w,w̄) + (1/(z−w))∂O(w,w̄) + regular
  Operator product expansion (OPE):
    Oᵢ(z)Oⱼ(0) = Σ_k C_{ij}^k |z|^{2(Δ_k−Δᵢ−Δⱼ)} O_k(0)
    C_{ij}^k = OPE coefficients (determine all n-point functions)

Minimal models M(p,q) — unitary for p>q>1, gcd(p,q)=1:
  c = 1 − 6(p−q)²/(pq)
  Primary operator dimensions:
    h_{r,s} = [(pr−qs)² − (p−q)²] / (4pq)    (1 ≤ r ≤ q−1, 1 ≤ s ≤ p−1)

Physically realised minimal models:
  M(3,4): c=1/2, Ising model (h=0, 1/16, 1/2 for I, σ, ε operators)
  M(4,5): c=7/10, Tricritical Ising (magnetisation, tricritical)
  M(5,6): c=6/7, Tricritical 3-state Potts model

State-operator correspondence:
  Every operator O in CFT corresponds to a state |O⟩ in the Hilbert space on S^1
  Partition function on torus: Z = Tr q^{L_0−c/24} q̄^{L̄_0−c/24}  (q = e^{2πiτ})
  Modular invariance → constraints on spectrum (Verlinde formula for fusion)
          

6. Connections to Machine Learning

In the last decade, researchers have found that many architectures in deep learning are, at their mathematical core, instances of statistical mechanics systems. The Boltzmann machine is an Ising model at temperature T. Diffusion generative models are discrete stochastic Langevin equations run in reverse. Score-based models solve Anderson’s time-reversed SDE. These correspondences are not merely aesthetic — they yield practical algorithms via physics-derived intuitions like contrastive divergence, annealing, and score matching.

Boltzmann Machines, Diffusion Models & Energy-Based Learning

Boltzmann machine:
  Energy: E(v, h) = −Σ_{ij} W_{ij} vᵢhⱼ − Σᵢ bᵢvᵢ − Σⱼ cⱼhⱼ
  Partition function: Z = Σ_{v,h} exp(−E/T)   → exactly Ising with hidden units
  Learning goal: maximise log P(v) = log Σ_h exp(−E/T) − log Z
  ∇_θ log P(v) = −⟨∂E/∂θ⟩_{data} + ⟨∂E/∂θ⟩_{model}
  Contrastive divergence (Hinton 2002): approximate model expectation via k Gibbs steps

Restricted Boltzmann Machine (RBM):
  No v-v or h-h connections → tractable:
    P(v) = Σ_h e^{−E(v,h)}/Z = ∏ⱼ 2cosh(cⱼ + Σᵢ W_{ij}vᵢ) × e^{bᵢvᵢ} / Z_v
  Hidden units h marginalised analytically
  Connection to PCA: in linear limit W^T W → singular value decomposition

Score-based diffusion models (Song & Ermon 2019):
  Forward process: q(xₜ|x_{t-1}) = N(xₜ; √(1−β_t)x_{t-1}, β_t I)  [noise addition]
  Reverse process: p_θ(x_{t-1}|xₜ) = N(x_{t-1}; μ_θ(xₜ,t), Σₜ)  [neural denoiser]
  Score network: s_θ(xₜ, t) ≈ ∇_{xₜ} log q(xₜ)     [Stein score]
  Physics: forward = Ornstein-Uhlenbeck; reverse = time-reversed SDE (Anderson 1982)
  Connection to Langevin MCMC: dx = −∇V(x)dt + √(2/β)dW   (target ∝ e^{−βV})

Neural tangent kernel (NTK) and Gaussian processes:
  Wide neural network (width → ∞): kernel K_{NTK}(x,x') = ∇θ f(x)·∇θ f(x')
  In this limit: network = Gaussian process with kernel K_NTK
  Connects to stat-mech: mean-field theory of infinite-width networks
  RG perspective: network depth = RG flow; hidden layer activations = block spins
    (Mehta & Schwab 2014: deep learning ↔ variational renormalisation group)

Replica method and generalisation:
  Spin glass theory (Parisi RSB): replica partition function
    Z^n = Σ_σ exp(−βH(σ^1)−...−βH(σ^n));  then ∂Z^n/∂n|_{n→0} = ⟨log Z⟩
  Applied to perceptron learning: Gardner-Derrida theory 1988
  Modern revival: analysis of SAT-UNSAT transition in neural network loss landscape,
    random matrix theory for singular value spectra of weight matrices
          

Thirty posts in: from Newtonian mechanics and orbital dynamics in Learning #1, through quantum mechanics, electrodynamics, special and general relativity, chaos theory, statistical mechanics, and now statistical field theory — the Learning series has built a continuous thread from classical physics to the frontier where theoretical physics and deep learning research overlap.

Try These Simulations