Info & Theory
DBSCAN (Density-Based Spatial Clustering of Applications
with Noise) finds clusters as dense regions separated by sparse
ones. It needs two parameters: a radius ε and a
count minPts.
Three kinds of point
-
Core: has at least
minPtspoints within radiusε(counting itself). -
Border: within
εof a core point but not itself dense enough. - Noise: neither — a sparse outlier, drawn as a grey ✕.
Growing a cluster
Pick an unvisited core point and start a cluster. Add every point in its ε-neighbourhood to a queue; for each dequeued point that is itself a core point, add its neighbours too. The cluster keeps expanding through chains of dense points until the frontier is exhausted.
Versus k-means
Unlike k-means, DBSCAN does not need you to pick the number of clusters — it discovers however many dense regions exist. It also follows arbitrary shapes (the Moons and Rings presets are non-convex, where k-means fails) and reports outliers explicitly as noise rather than forcing every point into a cluster.
Choosing ε and minPts
Too small an ε (or too large a minPts)
labels everything noise; too large an ε merges
everything into one blob. Drag the sliders to feel the trade-off.