Substring matching: minimum search distance after first successful match

12 Views Asked by At

Consider a string $x$ over some alphabet. The elements of the string are noted $x[0]$, $x[1]$, etc. $x[p,q]$ denotes the substring starting at index $p$ and ending at index $q$.

Assume the substring $x[0,L-1]$ is equal to $x[D,D+L-1]$, but $x[L] \neq x[D+L]$, i.e., we have found a maximal match of $L$ elements from the start of the string to the substring at distance $D$.

We are interested in the next match with a larger length $L+1$, i.e., lowest $D' > D$ such that $x[0,L]$ is equal to $x[D',D'+L]$.

Without any other information about $x$, there are values of $D'$ that can be immediately excluded, depending just on $D$ and $L$.

For example, if $D=1$ then $x[0]=x[1]=...=x[L]$. Assume now $D' \leq L+1$, then also $x[0]=x[D']=x[D'+1]=...=x[D'+L-1]$ and one of this terms is $x[L+1]$ and so $x[L+1]=x[L]$, which contradicts the hypothese that $L$ is maximal. So, necessarily, $D' > L+1$.

My question is: given $L$ and $D$, what is an algorithm to calculate the minimum possible $D'$, independently of $x$ ?