Quick Info

Category: Dynamic Programming
Time Complexity: O(mn)
Space Complexity: O(mn)
Optimal: Yes
In-Place: No

Edit Distance

Description

Edit Distance (also known as Levenshtein Distance) is a measure of similarity between two strings. It is defined as the minimum number of single-character operations required to transform one string into another. The allowed operations are:

  • Insert a character
  • Delete a character
  • Replace a character

History

Edit Distance Concept

The Edit Distance algorithm was developed by Vladimir Levenshtein in 1965. It was initially created to measure the difference between binary codes, but its applications quickly expanded to string comparison and natural language processing. The algorithm gained prominence in computational biology for DNA sequence alignment and in information theory for error detection. Today, it's widely used in spell checkers, DNA analysis, and fuzzy string matching applications.

Visualization

Click Start to begin visualization

Implementation


def edit_distance(str1, str2):
    m, n = len(str1), len(str2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Fill first row and column
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    
    # Fill dp table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if str1[i-1] == str2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(
                    dp[i-1][j],    # deletion
                    dp[i][j-1],    # insertion
                    dp[i-1][j-1]   # replacement
                )
    
    return dp[m][n]

# Example usage
str1 = "INTENTION"
str2 = "EXECUTION"
distance = edit_distance(str1, str2)
print(f"Edit distance between {str1} and {str2} is {distance}")

#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

int editDistance(const std::string& str1, const std::string& str2) {
    int m = str1.length();
    int n = str2.length();
    std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1));
    
    // Fill first row and column
    for(int i = 0; i <= m; i++)
        dp[i][0] = i;
    for(int j = 0; j <= n; j++)
        dp[0][j] = j;
    
    // Fill dp table
    for(int i = 1; i <= m; i++) {
        for(int j = 1; j <= n; j++) {
            if(str1[i-1] == str2[j-1])
                dp[i][j] = dp[i-1][j-1];
            else
                dp[i][j] = 1 + std::min({
                    dp[i-1][j],    // deletion
                    dp[i][j-1],    // insertion
                    dp[i-1][j-1]   // replacement
                });
        }
    }
    
    return dp[m][n];
}

int main() {
    std::string str1 = "INTENTION";
    std::string str2 = "EXECUTION";
    
    int distance = editDistance(str1, str2);
    std::cout << "Edit distance between " << str1 
              << " and " << str2 
              << " is " << distance << std::endl;
    
    return 0;
}

public class EditDistance {
    public static int Calculate(string str1, string str2) {
        int m = str1.Length;
        int n = str2.Length;
        int[,] dp = new int[m + 1, n + 1];
        
        // Fill first row and column
        for(int i = 0; i <= m; i++)
            dp[i,0] = i;
        for(int j = 0; j <= n; j++)
            dp[0,j] = j;
        
        // Fill dp table
        for(int i = 1; i <= m; i++) {
            for(int j = 1; j <= n; j++) {
                if(str1[i-1] == str2[j-1])
                    dp[i,j] = dp[i-1,j-1];
                else
                    dp[i,j] = 1 + Math.Min(
                        Math.Min(dp[i-1,j], dp[i,j-1]), // deletion, insertion
                        dp[i-1,j-1]                     // replacement
                    );
            }
        }
        
        return dp[m,n];
    }
    
    public static void Main(string[] args) {
        string str1 = "INTENTION";
        string str2 = "EXECUTION";
        
        int distance = Calculate(str1, str2);
        Console.WriteLine($"Edit distance between {str1} and {str2} is {distance}");
    }
}

Applications

  • Spell checking and autocorrect systems
  • DNA sequence alignment in bioinformatics
  • Natural Language Processing (NLP)
  • Fuzzy string matching and search
  • Plagiarism detection
  • Speech recognition systems
  • Machine translation evaluation
  • Record linkage and data deduplication

Complexity Analysis

Case Time Complexity Description
Best Case Ω(mn) When strings are identical or very similar
Average Case Θ(mn) For typical string pairs
Worst Case O(mn) When strings are completely different
Space Complexity O(mn) For the DP table storage

Notes:

  • m - Length of first string
  • n - Length of second string
  • Space can be optimized to O(min(m,n))
  • Each cell computation takes O(1) time
  • All cases have same complexity due to DP table filling

Advantages and Disadvantages

Advantages

  • Optimal solution guaranteed
  • Handles all types of string operations
  • Can be modified for different operation costs
  • Stable and predictable performance
  • Easy to implement and understand
  • Can handle strings of different lengths

Disadvantages

  • Quadratic time and space complexity
  • Not suitable for very long strings
  • Memory intensive for large inputs
  • Cannot handle multiple strings efficiently
  • No early termination possible
  • Limited to character-level operations