Edit Distance (Levenshtein): Interactive String Comparison

Description

Edit Distance (also known as Levenshtein Distance) is a measure of similarity between two strings. It is defined as the minimum number of single-character operations required to transform one string into another. The allowed operations are:

Insert a character
Delete a character
Replace a character

History

The Edit Distance algorithm was developed by Vladimir Levenshtein in 1965. It was initially created to measure the difference between binary codes, but its applications quickly expanded to string comparison and natural language processing. The algorithm gained prominence in computational biology for DNA sequence alignment and in information theory for error detection. Today, it's widely used in spell checkers, DNA analysis, and fuzzy string matching applications.

Visualization

Speed:

Click Start to begin visualization

Implementation


def edit_distance(str1, str2):
    m, n = len(str1), len(str2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Fill first row and column
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    
    # Fill dp table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if str1[i-1] == str2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            else:
                dp[i][j] = 1 + min(
                    dp[i-1][j],    # deletion
                    dp[i][j-1],    # insertion
                    dp[i-1][j-1]   # replacement
                )
    
    return dp[m][n]

# Example usage
str1 = "INTENTION"
str2 = "EXECUTION"
distance = edit_distance(str1, str2)
print(f"Edit distance between {str1} and {str2} is {distance}")


#include <string>
#include <vector>
#include <algorithm>
#include <iostream>

int editDistance(const std::string& str1, const std::string& str2) {
    int m = str1.length();
    int n = str2.length();
    std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1));
    
    // Fill first row and column
    for(int i = 0; i <= m; i++)
        dp[i][0] = i;
    for(int j = 0; j <= n; j++)
        dp[0][j] = j;
    
    // Fill dp table
    for(int i = 1; i <= m; i++) {
        for(int j = 1; j <= n; j++) {
            if(str1[i-1] == str2[j-1])
                dp[i][j] = dp[i-1][j-1];
            else
                dp[i][j] = 1 + std::min({
                    dp[i-1][j],    // deletion
                    dp[i][j-1],    // insertion
                    dp[i-1][j-1]   // replacement
                });
        }
    }
    
    return dp[m][n];
}

int main() {
    std::string str1 = "INTENTION";
    std::string str2 = "EXECUTION";
    
    int distance = editDistance(str1, str2);
    std::cout << "Edit distance between " << str1 
              << " and " << str2 
              << " is " << distance << std::endl;
    
    return 0;
}


public class EditDistance {
    public static int Calculate(string str1, string str2) {
        int m = str1.Length;
        int n = str2.Length;
        int[,] dp = new int[m + 1, n + 1];
        
        // Fill first row and column
        for(int i = 0; i <= m; i++)
            dp[i,0] = i;
        for(int j = 0; j <= n; j++)
            dp[0,j] = j;
        
        // Fill dp table
        for(int i = 1; i <= m; i++) {
            for(int j = 1; j <= n; j++) {
                if(str1[i-1] == str2[j-1])
                    dp[i,j] = dp[i-1,j-1];
                else
                    dp[i,j] = 1 + Math.Min(
                        Math.Min(dp[i-1,j], dp[i,j-1]), // deletion, insertion
                        dp[i-1,j-1]                     // replacement
                    );
            }
        }
        
        return dp[m,n];
    }
    
    public static void Main(string[] args) {
        string str1 = "INTENTION";
        string str2 = "EXECUTION";
        
        int distance = Calculate(str1, str2);
        Console.WriteLine($"Edit distance between {str1} and {str2} is {distance}");
    }
}

Applications

Spell checking and autocorrect systems
DNA sequence alignment in bioinformatics
Natural Language Processing (NLP)
Fuzzy string matching and search
Plagiarism detection
Speech recognition systems
Machine translation evaluation
Record linkage and data deduplication

Complexity Analysis

Case	Time Complexity	Description
Best Case	Ω(mn)	When strings are identical or very similar
Average Case	Θ(mn)	For typical string pairs
Worst Case	O(mn)	When strings are completely different
Space Complexity	O(mn)	For the DP table storage

Notes:

m - Length of first string
n - Length of second string
Space can be optimized to O(min(m,n))
Each cell computation takes O(1) time
All cases have same complexity due to DP table filling

Advantages and Disadvantages

Advantages

Optimal solution guaranteed
Handles all types of string operations
Can be modified for different operation costs
Stable and predictable performance
Easy to implement and understand
Can handle strings of different lengths

Disadvantages

Quadratic time and space complexity
Not suitable for very long strings
Memory intensive for large inputs
Cannot handle multiple strings efficiently
No early termination possible
Limited to character-level operations

Quick Info

Edit Distance