Quick Info

Category: Dynamic Programming
Time Complexity: O(mn)
Space Complexity: O(mn)
Optimal: Yes
In-Place: No

Longest Common Subsequence

Description

The Longest Common Subsequence (LCS) algorithm finds the longest subsequence common to two sequences. Unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. It's a classic dynamic programming problem with applications in bioinformatics and file comparison.

History

LCS Concept

The Longest Common Subsequence problem emerged in the early days of computer science, particularly in the context of computational biology and text comparison. The dynamic programming solution was first formalized in the 1970s. The algorithm gained significant importance with the rise of bioinformatics, where it's used to compare DNA sequences and identify evolutionary relationships. It also became fundamental in the development of diff tools for version control systems. Today, LCS remains a cornerstone algorithm in sequence comparison, file differencing, and various applications in computational biology.

How It Works

  1. Initialize DP Table:
    • Create a table of size (m+1) × (n+1)
    • Initialize first row and column with zeros
  2. Fill DP Table:
    • Compare characters from both sequences
    • If characters match, add 1 to diagonal value
    • If not, take maximum of left and top cells
  3. Build Solution:
    • Start from bottom-right cell
    • Trace back to construct the subsequence

Visualization

Click Start to begin visualization

Implementation


def lcs(X, Y):
    m, n = len(X), len(Y)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Fill dp table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i-1] == Y[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
    
    # Reconstruct the LCS
    lcs = []
    i, j = m, n
    while i > 0 and j > 0:
        if X[i-1] == Y[j-1]:
            lcs.append(X[i-1])
            i -= 1
            j -= 1
        elif dp[i-1][j] > dp[i][j-1]:
            i -= 1
        else:
            j -= 1
            
    return ''.join(reversed(lcs))

# Example usage
if __name__ == "__main__":
    X = "ABCDGH"
    Y = "AEDFHR"
    print(f"LCS of {X} and {Y} is {lcs(X, Y)}")

#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

std::string lcs(const std::string& X, const std::string& Y) {
    int m = X.length();
    int n = Y.length();
    
    // Create DP table
    std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));
    
    // Fill dp table
    for(int i = 1; i <= m; i++) {
        for(int j = 1; j <= n; j++) {
            if(X[i-1] == Y[j-1]) {
                dp[i][j] = dp[i-1][j-1] + 1;
            } else {
                dp[i][j] = std::max(dp[i-1][j], dp[i][j-1]);
            }
        }
    }
    
    // Reconstruct the LCS
    std::string lcs;
    int i = m, j = n;
    while(i > 0 && j > 0) {
        if(X[i-1] == Y[j-1]) {
            lcs = X[i-1] + lcs;
            i--; j--;
        } else if(dp[i-1][j] > dp[i][j-1]) {
            i--;
        } else {
            j--;
        }
    }
    
    return lcs;
}

int main() {
    std::string X = "ABCDGH";
    std::string Y = "AEDFHR";
    
    std::cout << "String 1: " << X << std::endl;
    std::cout << "String 2: " << Y << std::endl;
    std::cout << "LCS: " << lcs(X, Y) << std::endl;
    
    return 0;
}

using System;

public class LCS {
    public static string FindLCS(string X, string Y) {
        int m = X.Length;
        int n = Y.Length;
        
        // Create DP table
        int[,] dp = new int[m + 1, n + 1];
        
        // Fill dp table
        for(int i = 1; i <= m; i++) {
            for(int j = 1; j <= n; j++) {
                if(X[i-1] == Y[j-1]) {
                    dp[i,j] = dp[i-1,j-1] + 1;
                } else {
                    dp[i,j] = Math.Max(dp[i-1,j], dp[i,j-1]);
                }
            }
        }
        
        // Reconstruct the LCS
        char[] lcs = new char[dp[m,n]];
        int index = dp[m,n];
        int i = m, j = n;
        
        while(i > 0 && j > 0) {
            if(X[i-1] == Y[j-1]) {
                lcs[--index] = X[i-1];
                i--; j--;
            } else if(dp[i-1,j] > dp[i,j-1]) {
                i--;
            } else {
                j--;
            }
        }
        
        return new string(lcs);
    }
    
    public static void Main(string[] args) {
        string X = "ABCDGH";
        string Y = "AEDFHR";
        
        Console.WriteLine($"String 1: {X}");
        Console.WriteLine($"String 2: {Y}");
        Console.WriteLine($"LCS: {FindLCS(X, Y)}");
    }
}

Applications

  • DNA sequence alignment in bioinformatics
  • File difference comparison (diff tools)
  • Version control systems
  • Speech recognition
  • Pattern recognition in time series data

Complexity Analysis

Case Time Complexity Description
Best Case Ω(mn) Always needs to fill entire DP table
Average Case Θ(mn) Requires comparing all character pairs
Worst Case O(mn) Same as other cases due to DP approach
Space Complexity O(mn) Requires DP table of size m×n

Notes:

  • m - Length of first sequence
  • n - Length of second sequence
  • Space can be optimized to O(min(m,n))
  • Time complexity is optimal for the general case
  • Performance is independent of sequence content

Advantages and Disadvantages

Advantages

  • Optimal solution guaranteed
  • Works with any sequence type
  • Basis for many string algorithms
  • Easy to modify for variations

Disadvantages

  • Quadratic space complexity
  • Not suitable for very long sequences
  • Cannot handle multiple sequences efficiently
  • No early termination possible