Linear Algebra for Simulations: Sparse Solvers, Decompositions & Eigenvalues
Every physics simulation eventually reduces to a linear system Ax = b (a matrix A and a known vector b, solved for the unknown vector x). Whether it is the pressure Poisson equation (the equation that enforces incompressibility) in fluid dynamics, the global stiffness matrix (which relates forces to displacements) in finite-element analysis, or the Laplacian (a matrix approximation of the second derivative) in heat diffusion, the choice of linear solver determines whether a simulation runs in milliseconds or hours. This article surveys the algorithms — from direct LU decomposition (factoring a matrix into lower- and upper-triangular parts) to iterative Krylov methods (approximation methods that avoid ever forming the full matrix) — that power modern computational physics.
1. Why Linear Systems Dominate Simulation
Physical laws — conservation of mass, momentum, and energy — become linear systems when discretised in space and time. Three archetypal examples:
FEM: Global Stiffness Matrix
The finite-element method for structural mechanics produces:
N is the number of degrees of freedom (3 per node in 3D). For a mesh with 100,000 nodes, N ≈ 300,000. K has bandwidth determined by mesh connectivity — roughly 20–30 non-zeros per row — yielding about 6 million nonzero entries out of 90 billion total: a sparsity of 99.993%.
Fluid: Pressure Poisson Equation
In incompressible Navier–Stokes simulations (e.g. FLIP fluid, SPH), enforcing ∇·u = 0 requires solving:
The discrete Laplacian on a 256³ grid has N ≈ 16.7 million unknowns. Direct factorisation would require O(N³) ≈ 4.6 × 10²¹ operations — utterly infeasible. This is why iterative solvers exist.
Common Properties
- Sparsity: O(N) nonzeros out of N² possible entries
- Symmetry: For undirected graphs or self-adjoint operators, A = Aᵀ
- Positive definiteness (SPD): xᵀAx > 0 for all x ≠ 0 — enables special algorithms
- Structured sparsity: FEM matrices often have block structure following mesh topology
2. Sparse Matrix Formats: CSR, CSC, COO
Sparse matrix formats store only nonzero values and their positions, trading memory for the overhead of index arrays.
COO — Coordinate Format
Three arrays: row indices, column indices, values. Unordered, allows duplicate (row, col) entries (summed on conversion). Best for matrix assembly.
CSR — Compressed Sparse Row
The most common format for arithmetic. Replace row array with row pointer (length N+1 where rowptr[i+1] − rowptr[i] is the number of nonzeros in row i):
CSC — Compressed Sparse Column
Row and column roles swapped. Preferred by LAPACK convention and for column-oriented operations (e.g. factorisation pivoting). MATLAB's native format is CSC.
Format Comparison
| Format | Memory | Matvec | Assembly | Best for |
|---|---|---|---|---|
| COO | 3×nnz | Slow (cache miss) | Excellent | Building from mesh |
| CSR | 2×nnz + N+1 | Fast (row cache-local) | Poor | CG, GMRES matvec |
| CSC | 2×nnz + N+1 | Fast (col access) | Poor | LU factorisation |
| BSR | 2×nnz/bs + N/bs+1 | Fastest (SIMD blocks) | Moderate | FEM block DOFs |
3. Direct Solvers: LU, Cholesky, QR
Direct methods factorise A once and then solve via triangular back-substitution in O(N) operations — but the factorisation itself can be expensive and fill-in (creation of nonzeros where A was zero) limits scalability to N ≲ 10⁵ in 3D.
LU Decomposition
For a general n×n matrix: A = P · L · U where P is a permutation matrix (partial pivoting), L is lower triangular with unit diagonal, U is upper triangular.
Sparse LU via UMFPACK or SuperLU uses fill-reducing orderings. The nested dissection ordering gives O(N^1.5) fill-in for 2D grids and O(N^2) for 3D — much better than natural ordering but still worse than iterative solvers for very large 3D problems.
Cholesky Decomposition (SPD)
For symmetric positive definite matrices: A = LLᵀ where L is lower triangular with positive diagonal. No pivoting needed (stability is guaranteed for SPD). Roughly 2× faster than LU — only the lower triangle is stored and processed.
Applications: FEM stiffness matrices, covariance matrices in probabilistic methods, SPD preconditioner factors (incomplete Cholesky IC(0)).
QR Decomposition
Factorises A = QR where Q is orthogonal (Qᵀ = Q⁻¹) and R is upper triangular. Use cases:
- Least squares: min‖Ax − b‖₂ → Rx = Qᵀb (n columns, m rows, m > n)
- Eigenvalues: QR iteration for dense matrices
- Gram-Schmidt orthogonalisation: basis for Krylov subspace methods
Computed via Householder reflectors or Givens rotations. Householder QR is numerically superior to classical Gram-Schmidt — the orthogonality error is O(ε·κ(A)) vs O(ε·κ(A)²) for CGS.
4. Iterative Solvers: CG, GMRES, BiCGSTAB
Iterative methods start from an initial guess x₀ and improve it one "step" at a time. Each step costs O(nnz) for a sparse matrix-vector multiply. They never fully factorise A — making them applicable to problems where direct methods are infeasible.
Conjugate Gradient (CG) — SPD Only
The CG method (Hestenes & Stiefel, 1952) minimises the quadratic form f(x) = ½ xᵀAx − bᵀx over a sequence of A-conjugate search directions. For an SPD matrix, it converges in at most N steps in exact arithmetic (N = matrix size), but in practice converges to machine precision far sooner.
GMRES — General Matrices
The Generalised Minimal Residual method (Saad & Schultz, 1986) handles non-symmetric and indefinite systems. It builds an orthonormal Krylov basis {v₁, v₂, ..., vₖ} via Arnoldi iteration and minimises ‖b − Axₖ‖₂ over the Krylov subspace Kₖ = span{v₁, ..., vₖ}.
Restarted GMRES(k) restarts the Krylov basis every k steps to control memory. The tradeoff: smaller k uses less memory but may stagnate on difficult systems. Flexible GMRES (FGMRES) allows the preconditioner to change between iterations.
BiCGSTAB
Biconjugate Gradient Stabilised (van der Vorst, 1992) is a variant for non-symmetric systems that avoids GMRES's memory growth. It uses two sequences of residuals to achieve smoother convergence than BiCG (which can show erratic behaviour). Memory: O(N) — only a fixed number of vectors regardless of iteration count. Preferred over GMRES when memory is tight.
5. Preconditioners: Jacobi, ILU, Multigrid
A preconditioner M approximates A and transforms the system to M⁻¹Ax = M⁻¹b. Convergence of CG/GMRES depends on the condition number κ(M⁻¹A) — a good preconditioner reduces κ at low cost.
Jacobi (Diagonal) Preconditioner
Incomplete LU: ILU(0) and ILU(k)
Incomplete LU factorises A ≈ L̃Ũ where L̃ and Ũ share the sparsity pattern of A (ILU(0)) or allow k levels of additional fill-in (ILU(k)). The approximate factors are cheap to apply (sparse triangular solves) and substantially reduce κ.
ILU(k) allows fill-in up to distance k from existing nonzeros. ILU(1) or ILU(2) often reduces iteration count by 5–10× vs. Jacobi at the cost of 3–5× more memory per factor.
Multigrid: Optimal O(N) Convergence
Multigrid preconditioners (or standalone solvers) achieve O(N) total work by solving the problem on a hierarchy of coarser grids and using coarse-grid solutions to correct fine-grid residuals.
- Smoothing: a few Gauss-Seidel or Jacobi iterations damp high-frequency error components
- Restriction: project residual to coarser grid (full weighting or injection)
- Coarse solve: exact solve at coarsest level (trivially small)
- Prolongation: interpolate correction back to fine grid
- V-cycle: one down-pass + one up-pass; W-cycle: two recursive down-passes
Geometric multigrid requires a structured mesh hierarchy. Algebraic multigrid (AMG) constructs the hierarchy automatically from the matrix — HYPRE's BoomerAMG and ML/MueLu (Trilinos) are standard libraries in scientific computing. For Poisson equations on uniform grids, multigrid converges in O(1) iterations regardless of grid size.
| Preconditioner | Setup cost | Apply cost | Typical κ reduction | Best for |
|---|---|---|---|---|
| None | O(1) | O(1) | 1× | Well-conditioned A |
| Jacobi | O(N) | O(N) | 2–5× | Diagonally dominant |
| SSOR | O(nnz) | O(nnz) | 5–20× | General SPD |
| IC(0) | O(nnz) | O(nnz) | 10–50× | SPD sparse |
| ILU(0) | O(nnz) | O(nnz) | 10–50× | Non-symmetric |
| AMG | O(N log N) | O(N) | O(N)→O(1)× | Elliptic PDE |
6. Eigenvalue Algorithms: Power, QR, Lanczos
Eigenvalues appear wherever a simulation has natural frequencies, stability analysis, or modal decomposition. Finding all eigenvalues of an N×N matrix costs O(N³) — only feasible for N ≲ 10³. Large sparse eigenproblems require methods that compute a few eigenvalues cheaply.
Power Iteration
Finds the largest-magnitude eigenvalue λ_max and its eigenvector:
Inverse iteration (replace A by (A − σI)⁻¹) finds the eigenvalue nearest σ by using a linear solve per step — ideal when a direct solver is cheap and the target eigenvalue is known approximately. Shift-invert with CG/GMRES replace the dense solve with sparse iteration.
Dense QR Algorithm
For dense matrices (N ≲ 10³), all eigenvalues can be found via the QR iteration:
Lanczos Algorithm — Large Sparse
The Lanczos algorithm (1950) builds an orthonormal Krylov basis and reduces A to a symmetric tridiagonal matrix T of size k×k (k ≪ N). The eigenvalues of T (Ritz values) approximate the extreme eigenvalues of A. Only k SpMVs are needed — ideal for finding the 10–100 largest or smallest eigenvalues of sparse A.
In practice, loss of orthogonality accumulates — requiring either full re-orthogonalisation (ARPACK's IRLM approach) or selective orthogonalisation (Partial Lanczos). ARPACK (Arnoldi Package) is the standard library: its Python binding is SciPy's `scipy.sparse.linalg.eigsh` and `eigs`.
7. SVD, PCA & Mode Decomposition
The Singular Value Decomposition of an m×n matrix A:
The Eckart-Young theorem states that the best rank-k approximation to A (in any orthogonally invariant norm) is:
Proper Orthogonal Decomposition (POD)
In fluid dynamics and structural dynamics, POD (the SVD of a snapshot matrix) extracts the dominant spatial modes of a simulation. If U = [u(t₁) | u(t₂) | ... | u(tₙ)] contains N velocity snapshots, then:
- Columns of U (left singular vectors) are the POD spatial modes, ordered by energy content σᵢ²
- k modes capture fraction (σ₁² + ... + σₖ²) / ‖U‖_F² of total variance
- Projecting the PDE onto r POD modes gives a reduced-order model (ROM) of size r ≪ N
Dynamic Mode Decomposition (DMD)
DMD (Schmid, 2010) finds a best-fit linear operator that maps snapshot Uₖ to Uₖ₊₁:
DMD extracts spatiotemporal coherent structures from simulation data even without access to the governing equations — making it a data-driven reduced modeling tool applicable to turbulence, financial time series, and climate data.
8. JavaScript Sparse Conjugate Gradient
Below is a complete, self-contained implementation of a CSR sparse matrix and Preconditioned Conjugate Gradient solver in JavaScript. This is exactly the precision Poisson solver used in grid-based fluid simulations:
// ============================================================
// Sparse CSR matrix + Preconditioned Conjugate Gradient (PCG)
// Solves A·x = b where A is SPD and sparse (CSR format)
// ============================================================
class SparseCSR {
constructor(n) {
this.n = n;
this.rowptr = new Int32Array(n + 1);
this.cols = []; // will be converted to Int32Array
this.vals = []; // Float64Array after build
}
// Assemble from COO triplets [{row, col, val}]
static fromCOO(n, coo) {
// Sort by row then col
coo.sort((a, b) => a.row !== b.row ? a.row - b.row : a.col - b.col);
const mat = new SparseCSR(n);
const cols = [], vals = [];
const rowptr = new Int32Array(n + 1);
for (const {row, col, val} of coo) {
rowptr[row + 1]++;
cols.push(col);
vals.push(val);
}
for (let i = 1; i <= n; i++) rowptr[i] += rowptr[i - 1];
mat.rowptr = rowptr;
mat.cols = new Int32Array(cols);
mat.vals = new Float64Array(vals);
return mat;
}
// y = A·x (sparse matrix-vector product)
matvec(x, y) {
const {n, rowptr, cols, vals} = this;
for (let i = 0; i < n; i++) {
let s = 0;
for (let j = rowptr[i]; j < rowptr[i + 1]; j++) s += vals[j] * x[cols[j]];
y[i] = s;
}
}
// Diagonal (Jacobi) preconditioner
diagInv() {
const d = new Float64Array(this.n);
const {n, rowptr, cols, vals} = this;
for (let i = 0; i < n; i++) {
for (let j = rowptr[i]; j < rowptr[i + 1]; j++) {
if (cols[j] === i) { d[i] = 1 / vals[j]; break; }
}
}
return d;
}
}
// Solves A·x = b using Preconditioned CG
// Returns {x, iter, residual}
function pcg(A, b, opts = {}) {
const {tol = 1e-6, maxIter = 1000} = opts;
const n = A.n;
const diag = A.diagInv(); // Jacobi preconditioner M⁻¹
const x = new Float64Array(n);
const r = new Float64Array(n);
const z = new Float64Array(n);
const p = new Float64Array(n);
const Ap = new Float64Array(n);
// r₀ = b − A·x₀ (x₀ = 0 → r₀ = b)
for (let i = 0; i < n; i++) r[i] = b[i];
// z₀ = M⁻¹·r₀ (Jacobi: componentwise multiply)
let rz = 0;
for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; p[i] = z[i]; rz += r[i] * z[i]; }
let iter = 0;
while (iter < maxIter) {
A.matvec(p, Ap);
let pAp = 0;
for (let i = 0; i < n; i++) pAp += p[i] * Ap[i];
const alpha = rz / pAp;
let rnorm2 = 0;
for (let i = 0; i < n; i++) {
x[i] += alpha * p[i];
r[i] -= alpha * Ap[i];
rnorm2 += r[i] * r[i];
}
if (Math.sqrt(rnorm2) < tol) break;
let rzNew = 0;
for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; rzNew += r[i] * z[i]; }
const beta = rzNew / rz;
rz = rzNew;
for (let i = 0; i < n; i++) p[i] = z[i] + beta * p[i];
iter++;
}
return {x, iter, residual: Math.sqrt(rz)};
}
// ---- Example: 1D Poisson on uniform grid, 5 interior points ----
// −u'' = f, u(0)=u(1)=0, f=1 → u_i = xi(1−xi)/2
const N = 5;
const h = 1 / (N + 1);
const h2 = h * h;
// Build tridiagonal Laplacian matrix in COO
const coo = [];
for (let i = 0; i < N; i++) {
coo.push({row: i, col: i, val: 2 / h2});
if (i > 0) coo.push({row: i, col: i-1, val: -1 / h2});
if (i < N - 1) coo.push({row: i, col: i+1, val: -1 / h2});
}
const A = SparseCSR.fromCOO(N, coo);
const b = new Float64Array(N).fill(1); // f=1
const {x, iter} = pcg(A, b, {tol: 1e-10});
console.log("PCG converged in", iter, "iterations");
x.forEach((u, i) => {
const xi = (i + 1) * h;
const ref = xi * (1 - xi) / 2;
console.log(`u[${i}] = ${u.toFixed(8)} (ref: ${ref.toFixed(8)})`);
});