Edit Distance
Description
Edit Distance (also known as Levenshtein Distance) is a measure of similarity between two strings. It is defined as the minimum number of single-character operations required to transform one string into another. The allowed operations are:
- Insert a character
- Delete a character
- Replace a character
History

The Edit Distance algorithm was developed by Vladimir Levenshtein in 1965. It was initially created to measure the difference between binary codes, but its applications quickly expanded to string comparison and natural language processing. The algorithm gained prominence in computational biology for DNA sequence alignment and in information theory for error detection. Today, it's widely used in spell checkers, DNA analysis, and fuzzy string matching applications.
Visualization
Click Start to begin visualization
Implementation
def edit_distance(str1, str2):
m, n = len(str1), len(str2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Fill first row and column
for i in range(m + 1):
dp[i][0] = i
for j in range(n + 1):
dp[0][j] = j
# Fill dp table
for i in range(1, m + 1):
for j in range(1, n + 1):
if str1[i-1] == str2[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = 1 + min(
dp[i-1][j], # deletion
dp[i][j-1], # insertion
dp[i-1][j-1] # replacement
)
return dp[m][n]
# Example usage
str1 = "INTENTION"
str2 = "EXECUTION"
distance = edit_distance(str1, str2)
print(f"Edit distance between {str1} and {str2} is {distance}")
#include <string>
#include <vector>
#include <algorithm>
#include <iostream>
int editDistance(const std::string& str1, const std::string& str2) {
int m = str1.length();
int n = str2.length();
std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1));
// Fill first row and column
for(int i = 0; i <= m; i++)
dp[i][0] = i;
for(int j = 0; j <= n; j++)
dp[0][j] = j;
// Fill dp table
for(int i = 1; i <= m; i++) {
for(int j = 1; j <= n; j++) {
if(str1[i-1] == str2[j-1])
dp[i][j] = dp[i-1][j-1];
else
dp[i][j] = 1 + std::min({
dp[i-1][j], // deletion
dp[i][j-1], // insertion
dp[i-1][j-1] // replacement
});
}
}
return dp[m][n];
}
int main() {
std::string str1 = "INTENTION";
std::string str2 = "EXECUTION";
int distance = editDistance(str1, str2);
std::cout << "Edit distance between " << str1
<< " and " << str2
<< " is " << distance << std::endl;
return 0;
}
public class EditDistance {
public static int Calculate(string str1, string str2) {
int m = str1.Length;
int n = str2.Length;
int[,] dp = new int[m + 1, n + 1];
// Fill first row and column
for(int i = 0; i <= m; i++)
dp[i,0] = i;
for(int j = 0; j <= n; j++)
dp[0,j] = j;
// Fill dp table
for(int i = 1; i <= m; i++) {
for(int j = 1; j <= n; j++) {
if(str1[i-1] == str2[j-1])
dp[i,j] = dp[i-1,j-1];
else
dp[i,j] = 1 + Math.Min(
Math.Min(dp[i-1,j], dp[i,j-1]), // deletion, insertion
dp[i-1,j-1] // replacement
);
}
}
return dp[m,n];
}
public static void Main(string[] args) {
string str1 = "INTENTION";
string str2 = "EXECUTION";
int distance = Calculate(str1, str2);
Console.WriteLine($"Edit distance between {str1} and {str2} is {distance}");
}
}
Applications
- Spell checking and autocorrect systems
- DNA sequence alignment in bioinformatics
- Natural Language Processing (NLP)
- Fuzzy string matching and search
- Plagiarism detection
- Speech recognition systems
- Machine translation evaluation
- Record linkage and data deduplication
Complexity Analysis
Case | Time Complexity | Description |
---|---|---|
Best Case | Ω(mn) | When strings are identical or very similar |
Average Case | Θ(mn) | For typical string pairs |
Worst Case | O(mn) | When strings are completely different |
Space Complexity | O(mn) | For the DP table storage |
Notes:
- m - Length of first string
- n - Length of second string
- Space can be optimized to O(min(m,n))
- Each cell computation takes O(1) time
- All cases have same complexity due to DP table filling
Advantages and Disadvantages
Advantages
- Optimal solution guaranteed
- Handles all types of string operations
- Can be modified for different operation costs
- Stable and predictable performance
- Easy to implement and understand
- Can handle strings of different lengths
Disadvantages
- Quadratic time and space complexity
- Not suitable for very long strings
- Memory intensive for large inputs
- Cannot handle multiple strings efficiently
- No early termination possible
- Limited to character-level operations