Machine Learning · Neural Networks

June 2026 · 12 min read · Competitive Learning · BMU · U-matrix · Applications

Self-Organizing Maps: Topology-Preserving Neural Networks

Most neural networks learn from labelled examples. The self-organizing map (SOM), introduced by Teuvo Kohonen in 1982, learns with no labels at all. It takes high-dimensional data and lays it out on a low-dimensional grid — usually 2D — in such a way that similar inputs end up near each other. The result is a map you can literally look at: a flat picture in which the geometry of a complex dataset becomes visible. This article explains how a SOM learns through competition, why it preserves topology, and where it is used.

1. The Core Idea and Architecture

A SOM is a single layer of neurons arranged on a fixed grid — typically a rectangular or hexagonal lattice. Each neuron i carries a weight vector w_i that lives in the same space as the input data. If the inputs are 50-dimensional, every neuron's weight vector is also 50-dimensional, even though the neurons themselves sit on a 2D grid.

The grid position never changes; only the weight vectors are learned. The genius of the method is that two things are happening at once: the weight vectors move to cover the data (like clustering), while the fixed grid imposes a topology that forces neighbouring neurons to learn similar things. The trained grid becomes a low-dimensional, topology-preserving picture of a high-dimensional dataset.

Biological inspiration: the SOM is loosely modelled on cortical maps in the brain, such as the somatosensory homunculus or tonotopic maps in the auditory cortex, where nearby neurons respond to nearby stimuli. Kohonen's algorithm reproduces this ordered mapping computationally.

2. Competitive Learning and the BMU

Training is driven by competition. For each input vector x, every neuron measures how close its weight vector is to the input, and the closest neuron wins. That winner is called the Best Matching Unit (BMU).

For input x and neuron weights w_i, the BMU is the neuron c with minimum distance: c = argmin_i ‖ x - w_i ‖ where ‖\cdot‖ is usually the Euclidean distance: ‖ x - w_i ‖ = \sqrt( Σ_j (x_j - w_ij)² )

In plain competitive learning ("winner takes all"), only the BMU would update. The SOM's crucial addition is that the BMU and its neighbours on the grid all move toward the input. This neighbourhood update is what stitches the map together so that adjacent neurons end up representing adjacent regions of the data.

3. The Neighborhood Function

The neighborhood function h_ci controls how strongly a neuron i is pulled toward the input when neuron c is the BMU. It depends on the grid distance between c and i, and it shrinks over time.

Gaussian neighborhood: h_ci(t) = exp( - d(c,i)² / (2 σ(t)²) ) where d(c,i) = distance on the grid between BMU c and neuron i σ(t) = neighborhood radius, shrinking with time t Both the radius and the learning rate decay, e.g.: σ(t) = σ₀ \cdot exp( - t / λ ) α(t) = α₀ \cdot exp( - t / λ )

Early in training the radius σ is large, so a single input nudges a wide swath of the grid. This is the ordering phase, where the map unfolds and untangles itself to match the broad shape of the data. As σ shrinks the updates become local — the convergence (fine-tuning) phase — letting each neuron specialise. This schedule from global to local is essential; without the shrinking neighbourhood the map would not organise into an ordered layout.

4. The Training Algorithm Step by Step

The full update rule combines the BMU search, the neighborhood function, and the decaying learning rate α(t):

Weight update for every neuron i: w_i(t+1) = w_i(t) + α(t) \cdot h_ci(t) \cdot ( x(t) - w_i(t) ) The BMU (h_ci \approx 1) moves most; distant neurons (h_ci \approx 0) barely move.

The procedure, repeated for many iterations:

Initialise each neuron's weight vector (randomly, or by sampling the data / principal components for faster convergence).
Present a randomly chosen input vector x.
Find the BMU — the neuron whose weights are closest to x.
Update the BMU and its neighbours toward x using the rule above.
Decay the learning rate α and neighborhood radius σ.
Repeat until the weights stabilise.

Because each step only requires a distance computation and a local update, the SOM scales well and is easy to implement, which is part of why it remained popular long after its introduction.

5. Reading a Trained Map: the U-matrix

Once trained, the SOM is a layout — but how do you see clusters in it? The standard tool is the U-matrix (unified distance matrix). For each neuron it computes the average distance between that neuron's weight vector and the weight vectors of its immediate grid neighbours.

U(i) = average over neighbours j of ‖ w_i - w_j ‖ Low U-value \to neuron is similar to its neighbours \to inside a cluster High U-value \to big jump in the data \to boundary between clusters

Rendered as a heat map, the U-matrix shows clusters as low "valleys" separated by high "ridges." This turns an abstract high-dimensional dataset into a readable terrain map, which is exactly why analysts reach for SOMs when they want to see structure rather than just compute it. Other useful views include component planes (one heat map per input feature) and hit histograms showing how many data points land on each neuron.

6. Applications

SOMs are valued wherever exploratory visualisation of complex data matters:

Document and text mining: Kohonen's own WEBSOM project organised millions of documents into a browsable 2D map where related topics sit together.
Bioinformatics: clustering gene-expression profiles and grouping samples by molecular similarity.
Finance and economics: the "poverty map of the world," which arranges countries by dozens of welfare indicators onto a single readable grid.
Fault detection and process monitoring: normal operating states map to one region; deviations show up as inputs landing in unexpected parts of the map.
Image and colour quantisation: compressing a palette while preserving perceptual relationships between colours.

The SOM occupies a distinctive niche: it is simultaneously a clustering method, a dimensionality-reduction method, and a visualisation method. That combination — turning opaque high-dimensional data into a map you can read at a glance — is why it remains a teaching staple and a practical tool decades after Kohonen first described it.

Related simulations

🗺️

SOM Network Simulator

Watch a Kohonen map unfold over data and form clusters in real time

🧠

Neural Network Playground

Compare supervised learning with the SOM's unsupervised, competitive approach

⚙️

Perceptron Simulator

Start from the simplest neuron model to ground your intuition for weight updates