Longest Common Subsequence
Description
The Longest Common Subsequence (LCS) algorithm finds the longest subsequence common to two sequences. Unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. It's a classic dynamic programming problem with applications in bioinformatics and file comparison.
History

The Longest Common Subsequence problem emerged in the early days of computer science, particularly in the context of computational biology and text comparison. The dynamic programming solution was first formalized in the 1970s. The algorithm gained significant importance with the rise of bioinformatics, where it's used to compare DNA sequences and identify evolutionary relationships. It also became fundamental in the development of diff tools for version control systems. Today, LCS remains a cornerstone algorithm in sequence comparison, file differencing, and various applications in computational biology.
How It Works
- Initialize DP Table:
- Create a table of size (m+1) × (n+1)
- Initialize first row and column with zeros
- Fill DP Table:
- Compare characters from both sequences
- If characters match, add 1 to diagonal value
- If not, take maximum of left and top cells
- Build Solution:
- Start from bottom-right cell
- Trace back to construct the subsequence
Visualization
Click Start to begin visualization
Implementation
def lcs(X, Y):
m, n = len(X), len(Y)
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Fill dp table
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i-1] == Y[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
# Reconstruct the LCS
lcs = []
i, j = m, n
while i > 0 and j > 0:
if X[i-1] == Y[j-1]:
lcs.append(X[i-1])
i -= 1
j -= 1
elif dp[i-1][j] > dp[i][j-1]:
i -= 1
else:
j -= 1
return ''.join(reversed(lcs))
# Example usage
if __name__ == "__main__":
X = "ABCDGH"
Y = "AEDFHR"
print(f"LCS of {X} and {Y} is {lcs(X, Y)}")
#include <string>
#include <vector>
#include <algorithm>
#include <iostream>
std::string lcs(const std::string& X, const std::string& Y) {
int m = X.length();
int n = Y.length();
// Create DP table
std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
// Fill dp table
for(int i = 1; i <= m; i++) {
for(int j = 1; j <= n; j++) {
if(X[i-1] == Y[j-1]) {
dp[i][j] = dp[i-1][j-1] + 1;
} else {
dp[i][j] = std::max(dp[i-1][j], dp[i][j-1]);
}
}
}
// Reconstruct the LCS
std::string lcs;
int i = m, j = n;
while(i > 0 && j > 0) {
if(X[i-1] == Y[j-1]) {
lcs = X[i-1] + lcs;
i--; j--;
} else if(dp[i-1][j] > dp[i][j-1]) {
i--;
} else {
j--;
}
}
return lcs;
}
int main() {
std::string X = "ABCDGH";
std::string Y = "AEDFHR";
std::cout << "String 1: " << X << std::endl;
std::cout << "String 2: " << Y << std::endl;
std::cout << "LCS: " << lcs(X, Y) << std::endl;
return 0;
}
using System;
public class LCS {
public static string FindLCS(string X, string Y) {
int m = X.Length;
int n = Y.Length;
// Create DP table
int[,] dp = new int[m + 1, n + 1];
// Fill dp table
for(int i = 1; i <= m; i++) {
for(int j = 1; j <= n; j++) {
if(X[i-1] == Y[j-1]) {
dp[i,j] = dp[i-1,j-1] + 1;
} else {
dp[i,j] = Math.Max(dp[i-1,j], dp[i,j-1]);
}
}
}
// Reconstruct the LCS
char[] lcs = new char[dp[m,n]];
int index = dp[m,n];
int i = m, j = n;
while(i > 0 && j > 0) {
if(X[i-1] == Y[j-1]) {
lcs[--index] = X[i-1];
i--; j--;
} else if(dp[i-1,j] > dp[i,j-1]) {
i--;
} else {
j--;
}
}
return new string(lcs);
}
public static void Main(string[] args) {
string X = "ABCDGH";
string Y = "AEDFHR";
Console.WriteLine($"String 1: {X}");
Console.WriteLine($"String 2: {Y}");
Console.WriteLine($"LCS: {FindLCS(X, Y)}");
}
}
Applications
- DNA sequence alignment in bioinformatics
- File difference comparison (diff tools)
- Version control systems
- Speech recognition
- Pattern recognition in time series data
Complexity Analysis
Case | Time Complexity | Description |
---|---|---|
Best Case | Ω(mn) | Always needs to fill entire DP table |
Average Case | Θ(mn) | Requires comparing all character pairs |
Worst Case | O(mn) | Same as other cases due to DP approach |
Space Complexity | O(mn) | Requires DP table of size m×n |
Notes:
- m - Length of first sequence
- n - Length of second sequence
- Space can be optimized to O(min(m,n))
- Time complexity is optimal for the general case
- Performance is independent of sequence content
Advantages and Disadvantages
Advantages
- Optimal solution guaranteed
- Works with any sequence type
- Basis for many string algorithms
- Easy to modify for variations
Disadvantages
- Quadratic space complexity
- Not suitable for very long sequences
- Cannot handle multiple sequences efficiently
- No early termination possible