Physics & Simulation
April 2026 · 18 min read · Numerical Methods · Computational Physics · JavaScript · Last updated: 22 June 2026

Linear Algebra for Simulations: Sparse Solvers, Decompositions & Eigenvalues

Written by MySimulator Team · Reviewed by MySimulator Editorial Review

Every physics simulation eventually reduces to a linear system Ax = b (a matrix A and a known vector b, solved for the unknown vector x). Whether it is the pressure Poisson equation (the equation that enforces incompressibility) in fluid dynamics, the global stiffness matrix (which relates forces to displacements) in finite-element analysis, or the Laplacian (a matrix approximation of the second derivative) in heat diffusion, the choice of linear solver determines whether a simulation runs in milliseconds or hours. This article surveys the algorithms — from direct LU decomposition (factoring a matrix into lower- and upper-triangular parts) to iterative Krylov methods (approximation methods that avoid ever forming the full matrix) — that power modern computational physics.

TL;DR: Every physics simulation eventually reduces to solving a linear system Ax = b, and the solver chosen determines whether it runs in milliseconds or hours. Sparse matrices from finite-element meshes favor iterative Krylov methods like conjugate gradient, while small dense systems favor direct LU or Cholesky decomposition — the choice is the single biggest performance lever.

1. Why Linear Systems Dominate Simulation

Physical laws — conservation of mass, momentum, and energy — become linear systems when discretised in space and time. Three archetypal examples:

FEM: Global Stiffness Matrix

The finite-element method for structural mechanics produces:

K · u = f K ∈ ℝ^(N×N) : global stiffness matrix (sparse, symmetric positive definite) u ∈ ℝ^N : displacement vector (unknown) f ∈ ℝ^N : external force vector

N is the number of degrees of freedom (3 per node in 3D). For a mesh with 100,000 nodes, N ≈ 300,000. K has bandwidth determined by mesh connectivity — roughly 20–30 non-zeros per row — yielding about 6 million nonzero entries out of 90 billion total: a sparsity of 99.993%.

Fluid: Pressure Poisson Equation

In incompressible Navier–Stokes simulations (e.g. FLIP fluid, SPH), enforcing ∇·u = 0 requires solving:

∇²p = (ρ/Δt) · ∇·u* (Poisson equation for pressure) // Discretised on a grid with 5-point stencil (2D) or 7-point (3D) A · p = b where A is the discrete Laplacian matrix

The discrete Laplacian on a 256³ grid has N ≈ 16.7 million unknowns. Direct factorisation would require O(N³) ≈ 4.6 × 10²¹ operations — utterly infeasible. This is why iterative solvers exist.

Common Properties

2. Sparse Matrix Formats: CSR, CSC, COO

Sparse matrix formats store only nonzero values and their positions, trading memory for the overhead of index arrays.

COO — Coordinate Format

Three arrays: row indices, column indices, values. Unordered, allows duplicate (row, col) entries (summed on conversion). Best for matrix assembly.

Matrix: [ 5 0 0 ] COO format: [ 0 8 2 ] row = [0, 1, 1, 2, 2] [ 0 3 7 ] col = [0, 1, 2, 1, 2] val = [5, 8, 2, 3, 7]

CSR — Compressed Sparse Row

The most common format for arithmetic. Replace row array with row pointer (length N+1 where rowptr[i+1] − rowptr[i] is the number of nonzeros in row i):

rowptr = [0, 1, 3, 5] // row 0 has 1 nz, row 1 has 2, row 2 has 2 col = [0, 1, 2, 1, 2] val = [5, 8, 2, 3, 7] // Matrix-vector multiply (CSR): for i in 0..N: y[i] = 0 for j in rowptr[i]..rowptr[i+1]: y[i] += val[j] * x[col[j]]

CSC — Compressed Sparse Column

Row and column roles swapped. Preferred by LAPACK convention and for column-oriented operations (e.g. factorisation pivoting). MATLAB's native format is CSC.

Format Comparison

Format Memory Matvec Assembly Best for
COO 3×nnz Slow (cache miss) Excellent Building from mesh
CSR 2×nnz + N+1 Fast (row cache-local) Poor CG, GMRES matvec
CSC 2×nnz + N+1 Fast (col access) Poor LU factorisation
BSR 2×nnz/bs + N/bs+1 Fastest (SIMD blocks) Moderate FEM block DOFs

3. Direct Solvers: LU, Cholesky, QR

Direct methods factorise A once and then solve via triangular back-substitution in O(N) operations — but the factorisation itself can be expensive and fill-in (creation of nonzeros where A was zero) limits scalability to N ≲ 10⁵ in 3D.

LU Decomposition

For a general n×n matrix: A = P · L · U where P is a permutation matrix (partial pivoting), L is lower triangular with unit diagonal, U is upper triangular.

// Solve Ax = b: PA = LU → LUx = Pb 1. Forward sub: Ly = Pb O(n²) 2. Back sub: Ux = y O(n²) Factorisation cost: O(n³/3) flops (dense) Sparse fill-in: depends on elimination ordering (AMD, nested dissection)

Sparse LU via UMFPACK or SuperLU uses fill-reducing orderings. The nested dissection ordering gives O(N^1.5) fill-in for 2D grids and O(N^2) for 3D — much better than natural ordering but still worse than iterative solvers for very large 3D problems.

Cholesky Decomposition (SPD)

For symmetric positive definite matrices: A = LLᵀ where L is lower triangular with positive diagonal. No pivoting needed (stability is guaranteed for SPD). Roughly 2× faster than LU — only the lower triangle is stored and processed.

L[i][i] = sqrt(A[i][i] - sum(L[i][k]^2 for k < i)) L[j][i] = (A[j][i] - sum(L[j][k]*L[i][k] for k < i)) / L[i][i] (j > i)

Applications: FEM stiffness matrices, covariance matrices in probabilistic methods, SPD preconditioner factors (incomplete Cholesky IC(0)).

QR Decomposition

Factorises A = QR where Q is orthogonal (Qᵀ = Q⁻¹) and R is upper triangular. Use cases:

Computed via Householder reflectors or Givens rotations. Householder QR is numerically superior to classical Gram-Schmidt — the orthogonality error is O(ε·κ(A)) vs O(ε·κ(A)²) for CGS.

4. Iterative Solvers: CG, GMRES, BiCGSTAB

Iterative methods start from an initial guess x₀ and improve it one "step" at a time. Each step costs O(nnz) for a sparse matrix-vector multiply. They never fully factorise A — making them applicable to problems where direct methods are infeasible.

Conjugate Gradient (CG) — SPD Only

The CG method (Hestenes & Stiefel, 1952) minimises the quadratic form f(x) = ½ xᵀAx − bᵀx over a sequence of A-conjugate search directions. For an SPD matrix, it converges in at most N steps in exact arithmetic (N = matrix size), but in practice converges to machine precision far sooner.

Algorithm CG: r₀ = b − A·x₀; p₀ = r₀; k = 0 while ‖rₖ‖ > tol: αₖ = (rₖᵀrₖ) / (pₖᵀApₖ) // step length x_{k+1} = xₖ + αₖ·pₖ r_{k+1} = rₖ − αₖ·A·pₖ βₖ = (r_{k+1}ᵀr_{k+1}) / (rₖᵀrₖ) // update direction p_{k+1} = r_{k+1} + βₖ·pₖ k = k + 1 Cost per iteration: 1 SpMV (A·pₖ) + 4 dot products + 3 axpy operations Convergence: ‖eₖ‖_A ≤ 2·((√κ−1)/(√κ+1))^k · ‖e₀‖_A where κ = λmax/λmin

GMRES — General Matrices

The Generalised Minimal Residual method (Saad & Schultz, 1986) handles non-symmetric and indefinite systems. It builds an orthonormal Krylov basis {v₁, v₂, ..., vₖ} via Arnoldi iteration and minimises ‖b − Axₖ‖₂ over the Krylov subspace Kₖ = span{v₁, ..., vₖ}.

Krylov subspace: K_k(A, r₀) = span{r₀, Ar₀, A²r₀, ..., A^(k-1)r₀} Minimise: ‖b − A·xₖ‖₂ over x ∈ x₀ + K_k cost: k SpMVs + O(k²) Gram-Schmidt per cycle memory: O(k·N) — limits restart parameter k (typically k = 30–200)

Restarted GMRES(k) restarts the Krylov basis every k steps to control memory. The tradeoff: smaller k uses less memory but may stagnate on difficult systems. Flexible GMRES (FGMRES) allows the preconditioner to change between iterations.

BiCGSTAB

Biconjugate Gradient Stabilised (van der Vorst, 1992) is a variant for non-symmetric systems that avoids GMRES's memory growth. It uses two sequences of residuals to achieve smoother convergence than BiCG (which can show erratic behaviour). Memory: O(N) — only a fixed number of vectors regardless of iteration count. Preferred over GMRES when memory is tight.

5. Preconditioners: Jacobi, ILU, Multigrid

A preconditioner M approximates A and transforms the system to M⁻¹Ax = M⁻¹b. Convergence of CG/GMRES depends on the condition number κ(M⁻¹A) — a good preconditioner reduces κ at low cost.

Jacobi (Diagonal) Preconditioner

M = diag(A) → M⁻¹x = x / diag(A) Cost: O(N) — just divide each component by the diagonal entry Effect: normalises rows; removes easily-corrected diagonal dominance Works well for: diagonally dominant systems, first pass on poorly scaled A

Incomplete LU: ILU(0) and ILU(k)

Incomplete LU factorises A ≈ L̃Ũ where L̃ and Ũ share the sparsity pattern of A (ILU(0)) or allow k levels of additional fill-in (ILU(k)). The approximate factors are cheap to apply (sparse triangular solves) and substantially reduce κ.

ILU(0): compute LU but drop any fill-in at positions where A_{ij} = 0 → L̃ and Ũ have the same sparsity pattern as the lower/upper triangles of A Apply M⁻¹r: forward solve with L̃, then back solve with Ũ

ILU(k) allows fill-in up to distance k from existing nonzeros. ILU(1) or ILU(2) often reduces iteration count by 5–10× vs. Jacobi at the cost of 3–5× more memory per factor.

Multigrid: Optimal O(N) Convergence

Multigrid preconditioners (or standalone solvers) achieve O(N) total work by solving the problem on a hierarchy of coarser grids and using coarse-grid solutions to correct fine-grid residuals.

Geometric multigrid requires a structured mesh hierarchy. Algebraic multigrid (AMG) constructs the hierarchy automatically from the matrix — HYPRE's BoomerAMG and ML/MueLu (Trilinos) are standard libraries in scientific computing. For Poisson equations on uniform grids, multigrid converges in O(1) iterations regardless of grid size.

Preconditioner Setup cost Apply cost Typical κ reduction Best for
None O(1) O(1) Well-conditioned A
Jacobi O(N) O(N) 2–5× Diagonally dominant
SSOR O(nnz) O(nnz) 5–20× General SPD
IC(0) O(nnz) O(nnz) 10–50× SPD sparse
ILU(0) O(nnz) O(nnz) 10–50× Non-symmetric
AMG O(N log N) O(N) O(N)→O(1)× Elliptic PDE

6. Eigenvalue Algorithms: Power, QR, Lanczos

Eigenvalues appear wherever a simulation has natural frequencies, stability analysis, or modal decomposition. Finding all eigenvalues of an N×N matrix costs O(N³) — only feasible for N ≲ 10³. Large sparse eigenproblems require methods that compute a few eigenvalues cheaply.

Power Iteration

Finds the largest-magnitude eigenvalue λ_max and its eigenvector:

// Power iteration: v₀ = random unit vector repeat: w = A · vₖ vₖ₊₁ = w / ‖w‖ λ_k = vₖᵀ · A · vₖ (Rayleigh quotient) until convergence Rate: ‖error‖ decreases by |λ₂/λ₁| per step

Inverse iteration (replace A by (A − σI)⁻¹) finds the eigenvalue nearest σ by using a linear solve per step — ideal when a direct solver is cheap and the target eigenvalue is known approximately. Shift-invert with CG/GMRES replace the dense solve with sparse iteration.

Dense QR Algorithm

For dense matrices (N ≲ 10³), all eigenvalues can be found via the QR iteration:

A₀ = A Aₖ₊₁ = Rₖ · Qₖ where Aₖ = Qₖ · Rₖ (QR decomposition) Aₖ converges to (quasi-)upper triangular — eigenvalues on diagonal Practical: reduce to Hessenberg form first (O(N³)/3 flops, then O(N²) per step)

Lanczos Algorithm — Large Sparse

The Lanczos algorithm (1950) builds an orthonormal Krylov basis and reduces A to a symmetric tridiagonal matrix T of size k×k (k ≪ N). The eigenvalues of T (Ritz values) approximate the extreme eigenvalues of A. Only k SpMVs are needed — ideal for finding the 10–100 largest or smallest eigenvalues of sparse A.

Lanczos iteration (symmetric A): v₁ = random unit vector for j = 1, 2, ..., k: w = A · vⱼ αⱼ = vⱼᵀ · w w = w − αⱼ·vⱼ − βⱼ₋₁·vⱼ₋₁ βⱼ = ‖w‖; vⱼ₊₁ = w / βⱼ T = tridiag(α₁,...,αₖ; β₁,...,βₖ₋₁) eigvals(T) → approximate extreme eigvals of A

In practice, loss of orthogonality accumulates — requiring either full re-orthogonalisation (ARPACK's IRLM approach) or selective orthogonalisation (Partial Lanczos). ARPACK (Arnoldi Package) is the standard library: its Python binding is SciPy's `scipy.sparse.linalg.eigsh` and `eigs`.

7. SVD, PCA & Mode Decomposition

The Singular Value Decomposition of an m×n matrix A:

A = U · Σ · Vᵀ U ∈ ℝ^(m×m): left singular vectors (orthogonal) Σ ∈ ℝ^(m×n): diagonal, entries σ₁ ≥ σ₂ ≥ ... ≥ σₚ ≥ 0 V ∈ ℝ^(n×n): right singular vectors (orthogonal) p = min(m, n)

The Eckart-Young theorem states that the best rank-k approximation to A (in any orthogonally invariant norm) is:

Aₖ = Σᵢ₌₁ᵏ σᵢ · uᵢ · vᵢᵀ ‖A − Aₖ‖₂ = σₖ₊₁ (spectral norm error = next singular value)

Proper Orthogonal Decomposition (POD)

In fluid dynamics and structural dynamics, POD (the SVD of a snapshot matrix) extracts the dominant spatial modes of a simulation. If U = [u(t₁) | u(t₂) | ... | u(tₙ)] contains N velocity snapshots, then:

Dynamic Mode Decomposition (DMD)

DMD (Schmid, 2010) finds a best-fit linear operator that maps snapshot Uₖ to Uₖ₊₁:

Ũ₂ ≈ A · Ũ₁ → A = Ũ₂ · Ũ₁⁺ (Moore-Penrose pseudoinverse via SVD) // In practice: project A onto r POD modes (cheap), compute eigvals of // r×r projected operator, then recover approximate eigenfunctions of A

DMD extracts spatiotemporal coherent structures from simulation data even without access to the governing equations — making it a data-driven reduced modeling tool applicable to turbulence, financial time series, and climate data.

8. JavaScript Sparse Conjugate Gradient

Below is a complete, self-contained implementation of a CSR sparse matrix and Preconditioned Conjugate Gradient solver in JavaScript. This is exactly the precision Poisson solver used in grid-based fluid simulations:

// ============================================================
// Sparse CSR matrix + Preconditioned Conjugate Gradient (PCG)
// Solves A·x = b  where A is SPD and sparse (CSR format)
// ============================================================

class SparseCSR {
  constructor(n) {
    this.n = n;
    this.rowptr = new Int32Array(n + 1);
    this.cols   = [];    // will be converted to Int32Array
    this.vals   = [];    // Float64Array after build
  }

  // Assemble from COO triplets [{row, col, val}]
  static fromCOO(n, coo) {
    // Sort by row then col
    coo.sort((a, b) => a.row !== b.row ? a.row - b.row : a.col - b.col);
    const mat   = new SparseCSR(n);
    const cols  = [], vals = [];
    const rowptr = new Int32Array(n + 1);

    for (const {row, col, val} of coo) {
      rowptr[row + 1]++;
      cols.push(col);
      vals.push(val);
    }
    for (let i = 1; i <= n; i++) rowptr[i] += rowptr[i - 1];
    mat.rowptr = rowptr;
    mat.cols   = new Int32Array(cols);
    mat.vals   = new Float64Array(vals);
    return mat;
  }

  // y = A·x (sparse matrix-vector product)
  matvec(x, y) {
    const {n, rowptr, cols, vals} = this;
    for (let i = 0; i < n; i++) {
      let s = 0;
      for (let j = rowptr[i]; j < rowptr[i + 1]; j++) s += vals[j] * x[cols[j]];
      y[i] = s;
    }
  }

  // Diagonal (Jacobi) preconditioner
  diagInv() {
    const d = new Float64Array(this.n);
    const {n, rowptr, cols, vals} = this;
    for (let i = 0; i < n; i++) {
      for (let j = rowptr[i]; j < rowptr[i + 1]; j++) {
        if (cols[j] === i) { d[i] = 1 / vals[j]; break; }
      }
    }
    return d;
  }
}

// Solves A·x = b using Preconditioned CG
// Returns {x, iter, residual}
function pcg(A, b, opts = {}) {
  const {tol = 1e-6, maxIter = 1000} = opts;
  const n = A.n;
  const diag  = A.diagInv();          // Jacobi preconditioner M⁻¹
  const x     = new Float64Array(n);
  const r     = new Float64Array(n);
  const z     = new Float64Array(n);
  const p     = new Float64Array(n);
  const Ap    = new Float64Array(n);

  // r₀ = b − A·x₀  (x₀ = 0 → r₀ = b)
  for (let i = 0; i < n; i++) r[i] = b[i];

  // z₀ = M⁻¹·r₀  (Jacobi: componentwise multiply)
  let rz = 0;
  for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; p[i] = z[i]; rz += r[i] * z[i]; }

  let iter = 0;
  while (iter < maxIter) {
    A.matvec(p, Ap);
    let pAp = 0;
    for (let i = 0; i < n; i++) pAp += p[i] * Ap[i];

    const alpha = rz / pAp;
    let rnorm2 = 0;
    for (let i = 0; i < n; i++) {
      x[i] += alpha * p[i];
      r[i] -= alpha * Ap[i];
      rnorm2 += r[i] * r[i];
    }

    if (Math.sqrt(rnorm2) < tol) break;

    let rzNew = 0;
    for (let i = 0; i < n; i++) { z[i] = r[i] * diag[i]; rzNew += r[i] * z[i]; }
    const beta = rzNew / rz;
    rz = rzNew;
    for (let i = 0; i < n; i++) p[i] = z[i] + beta * p[i];
    iter++;
  }
  return {x, iter, residual: Math.sqrt(rz)};
}

// ---- Example: 1D Poisson on uniform grid, 5 interior points ----
// −u'' = f,  u(0)=u(1)=0,  f=1   → u_i = xi(1−xi)/2
const N  = 5;
const h  = 1 / (N + 1);
const h2 = h * h;

// Build tridiagonal Laplacian matrix in COO
const coo = [];
for (let i = 0; i < N; i++) {
  coo.push({row: i, col: i, val:  2 / h2});
  if (i > 0)     coo.push({row: i, col: i-1, val: -1 / h2});
  if (i < N - 1) coo.push({row: i, col: i+1, val: -1 / h2});
}
const A = SparseCSR.fromCOO(N, coo);
const b = new Float64Array(N).fill(1);         // f=1

const {x, iter} = pcg(A, b, {tol: 1e-10});

console.log("PCG converged in", iter, "iterations");
x.forEach((u, i) => {
  const xi  = (i + 1) * h;
  const ref = xi * (1 - xi) / 2;
  console.log(`u[${i}] = ${u.toFixed(8)}  (ref: ${ref.toFixed(8)})`);
});
    
Solver selection guide: For N < 5,000 and dense matrices → Cholesky or LU. For N < 100,000 and sparse SPD matrices → sparse Cholesky (CHOLMOD). For N > 100,000 and SPD elliptic PDEs → PCG + AMG preconditioner. For non-symmetric N > 10,000 → GMRES(30) or BiCGSTAB + ILU(1). For eigenvalues (a few, sparse) → Lanczos / ARPACK. In practice: call a library (LAPACK, ARPACK, HYPRE) rather than implementing from scratch — but understanding the algorithm is essential for choosing the right one.

Sources