Self-Organizing Maps: Topology-Preserving Neural Networks
Most neural networks learn from labelled examples. The self-organizing map (SOM), introduced by Teuvo Kohonen in 1982, learns with no labels at all. It takes high-dimensional data and lays it out on a low-dimensional grid — usually 2D — in such a way that similar inputs end up near each other. The result is a map you can literally look at: a flat picture in which the geometry of a complex dataset becomes visible. This article explains how a SOM learns through competition, why it preserves topology, and where it is used.
1. The Core Idea and Architecture
A SOM is a single layer of neurons arranged on a fixed grid — typically a rectangular or hexagonal lattice. Each neuron i carries a weight vector wi that lives in the same space as the input data. If the inputs are 50-dimensional, every neuron's weight vector is also 50-dimensional, even though the neurons themselves sit on a 2D grid.
The grid position never changes; only the weight vectors are learned. The genius of the method is that two things are happening at once: the weight vectors move to cover the data (like clustering), while the fixed grid imposes a topology that forces neighbouring neurons to learn similar things. The trained grid becomes a low-dimensional, topology-preserving picture of a high-dimensional dataset.
2. Competitive Learning and the BMU
Training is driven by competition. For each input vector x, every neuron measures how close its weight vector is to the input, and the closest neuron wins. That winner is called the Best Matching Unit (BMU).
In plain competitive learning ("winner takes all"), only the BMU would update. The SOM's crucial addition is that the BMU and its neighbours on the grid all move toward the input. This neighbourhood update is what stitches the map together so that adjacent neurons end up representing adjacent regions of the data.
3. The Neighborhood Function
The neighborhood function hci controls how strongly a neuron i is pulled toward the input when neuron c is the BMU. It depends on the grid distance between c and i, and it shrinks over time.
Early in training the radius σ is large, so a single input nudges a wide swath of the grid. This is the ordering phase, where the map unfolds and untangles itself to match the broad shape of the data. As σ shrinks the updates become local — the convergence (fine-tuning) phase — letting each neuron specialise. This schedule from global to local is essential; without the shrinking neighbourhood the map would not organise into an ordered layout.
4. The Training Algorithm Step by Step
The full update rule combines the BMU search, the neighborhood function, and the decaying learning rate α(t):
The procedure, repeated for many iterations:
- Initialise each neuron's weight vector (randomly, or by sampling the data / principal components for faster convergence).
- Present a randomly chosen input vector x.
- Find the BMU — the neuron whose weights are closest to x.
- Update the BMU and its neighbours toward x using the rule above.
- Decay the learning rate α and neighborhood radius σ.
- Repeat until the weights stabilise.
Because each step only requires a distance computation and a local update, the SOM scales well and is easy to implement, which is part of why it remained popular long after its introduction.
5. Reading a Trained Map: the U-matrix
Once trained, the SOM is a layout — but how do you see clusters in it? The standard tool is the U-matrix (unified distance matrix). For each neuron it computes the average distance between that neuron's weight vector and the weight vectors of its immediate grid neighbours.
Rendered as a heat map, the U-matrix shows clusters as low "valleys" separated by high "ridges." This turns an abstract high-dimensional dataset into a readable terrain map, which is exactly why analysts reach for SOMs when they want to see structure rather than just compute it. Other useful views include component planes (one heat map per input feature) and hit histograms showing how many data points land on each neuron.
6. Applications
SOMs are valued wherever exploratory visualisation of complex data matters:
- Document and text mining: Kohonen's own WEBSOM project organised millions of documents into a browsable 2D map where related topics sit together.
- Bioinformatics: clustering gene-expression profiles and grouping samples by molecular similarity.
- Finance and economics: the "poverty map of the world," which arranges countries by dozens of welfare indicators onto a single readable grid.
- Fault detection and process monitoring: normal operating states map to one region; deviations show up as inputs landing in unexpected parts of the map.
- Image and colour quantisation: compressing a palette while preserving perceptual relationships between colours.
The SOM occupies a distinctive niche: it is simultaneously a clustering method, a dimensionality-reduction method, and a visualisation method. That combination — turning opaque high-dimensional data into a map you can read at a glance — is why it remains a teaching staple and a practical tool decades after Kohonen first described it.