Info & Theory
Dynamic time warping (DTW) measures how similar two time series are even when they run at different speeds or are shifted in time. It stretches and squeezes the time axis to line up matching shapes.
The cost matrix
For series x (length n) and y (length
m) define the local cost d(i,j) = (x_i − y_j)². The
accumulated cost obeys the dynamic-programming recurrence
D(i,j) = d(i,j) + min( D(i−1,j), D(i,j−1), D(i−1,j−1) )
with D(0,0)=d(0,0). The heatmap on the right shows
D; the bright line is the cheapest
warping path from corner to corner, recovered by tracing
the minimum predecessors backwards.
DTW vs Euclidean
Plain Euclidean distance compares x_i with
y_i at the same index, so a small time shift looks
like a huge difference. DTW instead lets index i
match several indices j, so two copies of the same
shape at different speeds score a small distance — exactly what
you need for speech, handwriting and gesture matching.
Sakoe–Chiba band
The optional band forces the path to stay within
|i − j| ≤ r. This speeds DTW up and prevents
pathological warps, at the cost of disallowing very large time
shifts.