Approach 1: bottom up
It is simliar to LCS and we need to define DP first
DP[i,j]: the minimum edit distance needed between word1 and word2.base case: including the consideratio of \(\varnothing\), then any word modified to \(\varnothing\) would belen(word)- case 1: when
word1[i] == word2[j], then informatin is from top left corner - case 2: when
word1[i] != word2[j], the information is coming from the minimum out of top cell, left cell or top left cell. And we need do one most operation to replaceword1[i]andword2[j]
\[
DP[i,j] = \begin{cases}
i&if\,j=0\\
j&if\,i=0\\
DP[i-1,j-1]&if\,word1[i] = word2[j]\\
min(DP[i-1,j-1],DP[i-1,j],DP[i,j-1])&if\,word1[i] \neq word2[j]\\
\end{cases}
\]
I would do an example so you could visualize it by first initializing it.
| - | \(\varnothing\) | r | o | s |
|---|---|---|---|---|
| \(\varnothing\) | 0 | 1 | 2 | 3 |
| h | 1 | - | - | - |
| o | 2 | - | - | - |
| r | 3 | - | - | - |
| s | 4 | - | - | - |
| e | 5 | - | - | - |
The rest of it is extremely similar to the LCS example so i won't bother .
Code implementaton
from collections import defaultdict
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
# DP[i,j]: the minimum number of operations needed to edit from word1 to word2;
# base case: 如果任何一个substring为空集,# of ops needed to reach 空集 always will be len(other_words);
# case 1: if word1[i] == word2[j], then no editing needed from upper left corner
# case 2: if word1[i] != word2[j], min(top cell, topleft cell, left cell)
# 由于定义是min # of operations needed, 那么信息可以从任一previous step流动过来
x = len(word1)
y = len(word2)
# 加入空集的考量,就不用treat边界 and mid nodes different了
DP = defaultdict(int)
for i in range(x+1):
DP[(i,0)] = i
for j in range(y+1):
DP[(0,j)] = j
# horizontal scanning, left to right, top to bottom
for i in range(1,x + 1):
for j in range(1,y + 1):
# 判断是否相等
if word1[i-1] == word2[j-1]:
# info is propogating from top left
DP[(i,j)] = DP[(i-1,j-1)]
else:
# info is propogating from top or left
DP[(i,j)] = 1 + min(DP[(i-1,j)],DP[(i,j-1)],DP[(i-1,j-1)])
return DP[(x,y)]
food for thought, what if you can only do delete and insert?
If only delete and insert allowed
In only delete and insert allowed, you can reduce the problem to the classics LCS problem.
from collections import defaultdict
class Solution:
def minDistance(self, word1: str, word2: str) -> int:
# similar to LCS, find the largest common substring between word1 and word2,
# then
# example: "horse", "ros", --> LCS ="os". len(word1) - len(LCS) = 5 - 2 = 3
# example: "intention", "execution", --> LCS ="tion". len(word1) - len(LCS) = 9 - 4 = 5
# initialize to be zeros
x = len(word1)
y = len(word2)
# 加入空集的考量,就不用treat边界 and mid nodes different了
LCS = defaultdict(int)
# horizontal scanning, left to right, top to bottom
for i in range(1,x + 1):
for j in range(1,y + 1):
# 判断是否相等
if word1[i-1] == word2[j-1]:
# info is propogating from top left
LCS[(i,j)] = LCS[(i-1,j-1)] + 1
else:
# info is propogating from top or left
LCS[(i,j)] = max(LCS[(i-1,j)],LCS[(i,j-1)])
# replace operation: 最划算,相当于delete 1 char + add 1 char
# insert operation: 增加一个数字
# delete operation: 减少一个数字
# 怎么样计算replace a character的number of opetaion呢?
edit_distance = len(word1) - LCS[(x,y)] + len(word2) - LCS[(x,y)]
return edit_distance