Representing a String of n characters, as a unique integer

575 Views Asked by At

So I have been following the lecture series on algorithms from MIT. I got to the part on Rolling Hashes and Karp-Rabin algorithm

Karp Rabin Notes

The way I understand it is in order to represent the string as a unique integer. It looks like this

sum = 0
for character in string
  sum = sum * R + ord(character) mod p

Where R is the size of the alphabet and ord() gives the position of the character in the alphabet. And p is some large prime.

From that its pretty trivial that if you appended a new character to the string. You would just run another iteration of that loop and

sum = (sum * R + ord(newCharacter)) mod p

What I am not understanding is how you get the formula for removing a character from the beginning of the string.

sum = sum - ord(characterToRemove) * ((R^len(s) - 1)) mod p)) mod p

where len(s) is the length of the string.

I intuitively see why the ((R^len(s) - 1) mod p)) term is needed.

However could someone help show me how that term is derived. I think its just some gap in my algebra knowledge, that is preventing me from seeing it.

1

There are 1 best solutions below

4
On BEST ANSWER

It's easiest to think of this like a number represented in base $R$. Treating each character as its value, you end up with $$\mathrm{ord}(a_1)R^{n-1}+\mathrm{ord}(a_2)R^{n-2}+\cdots+\mathrm{ord}(a_{n-1})R+\mathrm{ord}(a_n)$$ You should convince yourself that this is what the non recursive formula turns out to look like. It's less efficient to calculate it that way, but it's easier to see how to remove the first character.