Most scientists encounter linear algebra as a compulsory course before they need it. The result is that many learn the mechanics (row reduction, matrix multiplication) without the geometry. This post tries to do the opposite: start from geometric meaning and derive the algebra as the natural language for describing it.
1. Matrices as Linear Maps
A matrix A of size m×n defines a linear function f: ℝn → ℝm by f(x) = Ax. Linear means two properties hold:
- f(x + y) = f(x) + f(y)
- f(αx) = α f(x)
Geometrically, a linear map sends straight lines to straight lines (or collapses them to a point if det A = 0), and sends the origin to itself. The columns of A tell you exactly where the standard basis vectors go: column j of A is f(ej).
Composition and Basis Change
Composition: (A ∘ B) x = A(Bx) → matrix product AB
Change of basis from B to C:
[v]_C = M_{C←B} [v]_B where M_{C←B} = C⁻¹ B
Similarity transform: A in basis C → A' = P⁻¹ A P
(P = change-of-basis matrix, columns = new basis vectors in old coords)
For orthonormal bases: P⁻¹ = Pᵀ (rotation/reflection matrices)
This is why the choice of coordinate system matters so much in physics: expressing a tensor in its principal axes (the basis of eigenvectors) makes its action diagonal and interpretable. The inertia tensor of a rigid body becomes I1, I2, I3 along the principal axes; the stress tensor at a point becomes three principal stresses without shear.
2. Determinants as Signed Volume Scaling
The determinant of a square matrix A equals the signed volume of the parallelotope spanned by its column vectors. For 2×2 matrices:
Determinant — Geometric and Algebraic Forms
2×2: det(A) = ad − bc
(signed area of parallelogram spanned by columns)
3×3: det(A) = a(ei−fh) − b(di−fg) + c(dh−eg)
(Sarrus rule / cofactor expansion along row 1)
Properties:
det(AB) = det(A) det(B)
det(Aᵀ) = det(A) (transpose preserves volume)
det(A⁻¹) = 1/det(A)
det(αA) = α∧n det(A) (scaling each row by α scales det by α)
det(A) = 0 ⇔ A is singular ⇔ columns are linearly dependent
In the interactive matrix transforms visualiser, the determinant determines how a shape’s area changes under the transformation: a unit square becomes a parallelogram with area |det A|. A negative determinant indicates a reflection (orientation reversal). When det = 0, the entire plane collapses onto a line or a point.
3. Eigenvectors and Eigenvalues
An eigenvector of matrix A is a non-zero vector v satisfying Av = λv: the transformation only scales the vector, not rotates it. The scalar λ is the corresponding eigenvalue.
Characteristic Polynomial and Diagonalisation
Eigenvalue equation: Av = λv ⇔ (A − λI)v = 0 Characteristic poly: det(A − λI) = 0 For 2×2: λ² − tr(A)λ + det(A) = 0 λ₁₂ = [tr(A) ± √(tr(A)² − 4 det(A))] / 2 Diagonalisation (if A has n independent eigenvectors): A = P D P⁻¹ D = diag(λ₁, …, λₙ), P = [v₁ | v₂ | … | vₙ] Powers: Aᵁ= = P Dᵁ= P⁻¹ (cheap: just raise each λ𝑖 to the power k) Exponential: eᴬᵀ = P eᴬ P⁻¹ (useful for linear ODE systems)
Eigenvalues govern the long-run behaviour of linear dynamical systems xn+1 = Axn: the system grows if any |λ| > 1 and contracts to zero if all |λ| < 1. For continuous systems dx/dt = Ax, stability requires all eigenvalues to have negative real parts.
4. The Spectral Theorem and Its Applications
The spectral theorem is the central result of linear algebra for physics:
Spectral Theorem (Real Symmetric Case)
Let A = Aᵀ (real symmetric, n×n).
Then:
1. All eigenvalues of A are real.
2. Eigenvectors for distinct eigenvalues are orthogonal.
3. A is orthogonally diagonalisable: A = Q Λ Qᵀ
Q orthogonal (QᵀQ = I), Λ = diag(λ₁, …, λₙ)
For Hermitian matrices (A = A†, complex):
Same conclusions hold over ℂ.
Eigenvalues real ⇔ quantum observables give real measurements.
The spectral theorem has direct, foundational interpretations in multiple fields:
- Quantum mechanics: every observable (energy, momentum, position) is a Hermitian operator. Its eigenvalues are the possible measurement outcomes; its eigenvectors are the quantum states with definite values of the observable (e.g., energy eigenstates for the Hamiltonian).
- Normal modes: the mass-weighted stiffness matrix K_~ = M−½KM−½ is symmetric. Its eigenvectors are the normal mode shapes; its eigenvalues give ωi2. Diagonalisation decouples N coupled ODEs into N independent harmonic oscillators.
- Finite element analysis: the global stiffness matrix K is symmetric positive-definite. Its eigenvalues are related to natural frequencies; solving Ku = f is well-posed when K has no zero eigenvalues (structure fully constrained).
5. Singular Value Decomposition (SVD)
Eigenvalue decomposition requires a square matrix. SVD generalises it to any m×n matrix and is more numerically stable:
SVD and the Pseudoinverse
A = U Σ Vᵀ (any real m×n matrix) U m×m orthogonal (left singular vectors = columns) Σ m×n diagonal (singular values σ₁ ≥ σ₂ ≥ … ≥ 0) V n×n orthogonal (right singular vectors = columns) Relationship to eigenvalues: AᵀA = V ΣᵀΣ Vᵀ, singular values σ𝑖 = √(eigenvalues of AᵀA) Truncated SVD (rank-k approximation): A ≈ Uᵁ Σᵁ Vᵁᵀ (best k-rank approximation, Eckart-Young theorem) Moore-Penrose pseudoinverse: A⁺ = V Σ⁺ Uᵀ where Σ⁺ = diag(1/σ₁, …, 1/σ𝑟, 0, …) Least-squares solution: x∗ = A⁺ b (minimises ‖Ax−b‖²)
SVD is the workhorse of numerical linear algebra: it solves least-squares problems (data fitting, tomographic reconstruction), computes low-rank approximations (image compression, latent semantic analysis), and provides the condition number κ(A) = σmax/σmin which quantifies how sensitive Ax = b is to perturbations.
6. Principal Component Analysis (PCA)
PCA finds the directions of maximum variance in a dataset. Given n data points in ℜd (rows of matrix X, mean-centred), PCA diagonalises the sample covariance matrix:
PCA via Covariance Eigendecomposition
Sample covariance: C = (1/(n−1)) Xᵀ X (d×d, symmetric PSD) Eigendecomposition: C = Q Λ Qᵀ λ₁ ≥ λ₂ ≥ … ≥ λ𝑑 ≥ 0 (principal variances) q₁, q₂, …, q𝑑 (principal components) Projection onto first k PCs: X = X Qᵁ (n×k, low-dimensional representation) Variance retained by k components: R_k = (λ₁ + … + λᵁ) / (λ₁ + … + λ𝑑) Connection to SVD: If X = U Σ Vᵀ then eigenvectors of C = V, λ𝑖 = σ𝑖²/(n−1)
PCA appears throughout science: it is used to identify the dominant modes of climate variability (EOF analysis), to compress gene expression profiles (bioinformatics), to separate signal sources in EEG/MEG (when combined with independent component analysis), and to initialise neural network training by whitening the input feature space.
Interactive Visualisations
The matrix transforms simulation lets you build geometric intuition for all the concepts above. Use the 2×2 sliders to construct rotations (det = 1), reflections (det = −1), shears (det = 1, one eigenvalue = 1), scalings, and projections (det = 0). The eigenvector overlay shows the fixed directions when they exist; the unit circle overlay shows where circles map under the transformation (the semi-axes of the result are the singular values).
Matrix Transforms Visualiser
2×2 matrix sliders, unit grid / basis vectors / eigenvectors / unit circle layers, 8 presets, continuous animation, trace / det / λ₁/λ₂ / matrix-type panel.
Linear Regression (OLS)
Click-to-add scatter plot, ordinary least-squares line, slope / intercept / R² / Pearson r / SSE, residuals, 5 data presets, undo/clear.
Why linear algebra everywhere? Because the real world is rarely linear — but it often approximately is, locally. Linearisation (Taylor expansion around an equilibrium, Jacobian matrix of a dynamical system) reduces any smooth non-linear problem to a linear one at small amplitude. The eigenvalues of that linear approximation determine local stability; the eigenvectors determine the characteristic timescales and normal-mode shapes.
Further Reading
The following topics build naturally from this post:
- Least squares and regression: the normal equations ATAx = ATb and their connection to SVD; ridge regression as Tikhonov regularisation.
- Iterative eigensolvers: the power iteration, QR algorithm, Lanczos method for large sparse matrices (finite-element, graph Laplacian).
- Tensor algebra: generalisation of matrices to higher-order arrays; contraction, trace, Einstein summation; stress and strain tensors in continuum mechanics.
- Random matrices: Wigner semicircle law, Marchenko–Pastur distribution for sample covariance matrices of random data.