Info & Theory
The edit distance (Levenshtein distance) between two strings is the minimum number of single-character insertions, deletions and substitutions that turn one into the other.
The DP recurrence
Let d[i][j] be the distance between the first
i characters of A and the first j of
B. Then:
d[0][j] = jandd[i][0] = i.-
If
A[i] = B[j],d[i][j] = d[i−1][j−1](free copy). -
Otherwise
d[i][j] = 1 + min(deleted[i−1][j], insertd[i][j−1], substituted[i−1][j−1]).
Backtracking the path
Start at the bottom-right cell and walk back to
d[0][0]. A diagonal step is a match or
substitution, a step up is a deletion, a step left is an
insertion. The reversed steps give the optimal alignment.
Complexity
The table has (m+1)×(n+1) cells, so filling it
takes O(m·n) time and space. Only the distance? You
can keep two rows for O(min(m,n)) space.
Where it is used
Spell checkers, fuzzy search, DNA alignment, plagiarism detection and diff tools all build on this exact recurrence.
Frequently asked questions
What is edit distance?
Edit distance, or Levenshtein distance, is the minimum number of single-character edits — insertions, deletions or substitutions — needed to change one string into another.
How does the dynamic-programming table work?
A grid of size (m+1)×(n+1) stores the edit distance between every prefix of the two strings. Each cell is the minimum of its left, top and diagonal neighbours plus the cost of the edit, so the bottom-right cell holds the full distance.
Why is the diagonal move special?
A diagonal move aligns one character of each string. If the characters match, the cost is zero (a copy); if they differ, it is a substitution costing one.
What is backtracking in this algorithm?
After filling the table you start at the bottom-right cell and walk back to the top-left, each step choosing the neighbour that produced the current value. This reconstructs the actual sequence of edits.
What is the time complexity?
Filling the table takes O(m·n) time and O(m·n) space, where m and n are the string lengths. Space can be reduced to O(min(m,n)) if only the distance is needed.
Where is edit distance used?
Spell checkers, fuzzy search, DNA and protein alignment, plagiarism detection, OCR correction and diff tools all rely on edit-distance computations.
Is the answer always unique?
The distance value is unique, but there can be several optimal edit paths of the same total cost. The backtracking here picks one consistent path.
What is the difference between Levenshtein and Hamming distance?
Hamming distance only counts substitutions and requires equal-length strings. Levenshtein distance also allows insertions and deletions, so it handles strings of different lengths.
Can the operations have different costs?
Yes. Weighted edit distance assigns different costs to insertion, deletion and substitution, which is common in spell-checking where some mistakes are more likely than others. This simulation uses unit costs.
Why does the path stay near the diagonal for similar words?
When two strings are alike, most characters align directly, so the optimal path runs close to the main diagonal with only a few off-diagonal insertion or deletion steps.