🧬 Longest Common Subsequence
DP alignment grid
Cell 0 / 0
LCS:
Strings
Controls
Stats
Cells filled
0
LCS length
Phase
Fill
Status
Ready
Matches
Info & Theory

The longest common subsequence (LCS) of two strings is the longest run of characters appearing in both in the same order, but not necessarily next to each other.

The DP recurrence

Let L[i][j] be the LCS length of the first i characters of A and first j of B:

  • L[0][j] = L[i][0] = 0.
  • If A[i] = B[j]: L[i][j] = L[i−1][j−1] + 1 (a diagonal match).
  • Otherwise L[i][j] = max(L[i−1][j], L[i][j−1]).

Recovering the subsequence

From the bottom-right cell, walk back: on a match move diagonally and keep the character; otherwise step toward the larger neighbour. Reverse the collected characters to get the LCS.

Complexity

The grid has (m+1)×(n+1) cells, so the algorithm runs in O(m·n) time and space.

Where it is used

The diff utility, version control, DNA sequence comparison and plagiarism detection are all powered by LCS.

Frequently asked questions

What is the longest common subsequence?

The longest common subsequence (LCS) of two strings is the longest sequence of characters that appears in both, in the same order, but not necessarily contiguously.

How is a subsequence different from a substring?

A substring is a contiguous run of characters, while a subsequence keeps the original order but may skip characters. ACE is a subsequence of ABCDE but not a substring.

How does the DP grid solve LCS?

Cell L[i][j] holds the LCS length of the first i characters of one string and the first j of the other. If the characters match, it is the diagonal value plus one; otherwise it is the larger of the cell above or to the left.

Why does a diagonal step mean a match?

A diagonal step consumes one character from each string at once. The DP only takes that step when those two characters are equal, so each diagonal on the path is a matched character of the LCS.

How is the subsequence recovered?

Starting at the bottom-right cell, you walk back: on a match you move diagonally and prepend the character, otherwise you move toward the larger neighbour. The collected characters, reversed, form the LCS.

What is the time complexity?

Filling the grid takes O(m·n) time and O(m·n) space for two strings of length m and n. Reconstruction adds only O(m+n).

Where is LCS used in practice?

Diff and version-control tools, DNA and protein sequence comparison, plagiarism detection and data merging all rely on LCS to find shared structure.

Is the LCS always unique?

The length is unique, but several different subsequences can reach that maximum length. The backtracking shown here recovers one valid LCS.

How does LCS relate to edit distance?

When only insertions and deletions are allowed, the edit distance equals m + n − 2·LCS. LCS and Levenshtein distance are closely related alignment problems.

Can LCS handle more than two sequences?

Yes, but the DP table gains a dimension per sequence, so the problem becomes NP-hard for an arbitrary number of sequences. This simulation handles the classic two-string case.