Recursive formula for minimal editing distance - check my answer

317 Views Asked by At

Given a word $X=x_1x_2x_3...x_i$ and $Y=y_1y_2y_3...y_j$, the minimal editing distance is defined to be the minimal number of actions needed to transform $X$ to $Y$ where the legal actions are:

1) Delete a letter

2) Add a letter

For example, the minimal editing distance from $X=Sport$ to $Y=Sort$ is $1$. We just need to remove the $p$. The minimal editing distance from $X=Computer$ to $Y=Commuter$ is $2$. First we need to remove the $p$, and then we need to add $m$. Pretty straightforward.

The question was to find a recursive formula for $d(i,j)$ where $d(i,j)$ is the minimal editing distance from $X=x_1...x_i$ to $Y=y_1...y_j$.

What I did:

Firstly the edge cases are pretty clear, $d(i,0)=i$ and $d(0,j)=j$ and $d(0,0)=0$.

Now comes the trickier part:

Suppose $i>j$. then at least we need $i-j$ actions so $X$ and $Y$ will be same length, length $j$. Now they are both same length so we have the optimal solution $d(j,j)$.

and if $j \geq i$, then the opposite happens, we need $j-i$ so they are same length, and from there we need $d(i,i)$ steps.

So $d(i,j)=i-j+d(j,j)$ if $i>j$, and $d(i,j)=j-i+d(j,j)$ if $j\geq i$.

Is this the correct result? This doesn't seem correct because well, won't $d(i,i)$ depend on what the letters actually are? What letters are we deleting / adding? It seems too simple.

Edit: Also this is wrong because for example the case where $i=j$. I define $d(i,i)$ using $d(i,i)$ so this is clearly wrong.

1

There are 1 best solutions below

0
On

Managed to find an answer that fits and seems to work:

if $x_i \neq y_j$ then $d(i,j)=min(d(i-1,j)+1,d(i,j-1)+1,d(i-1,j-1)+2)$

and if $x_i=y_j$ then $d(i,j)=min(d(i-1,j)+1,d(i,j-1)+1,d(i-1,j-1))$