So I have been following the lecture series on algorithms from MIT. I got to the part on Rolling Hashes and Karp-Rabin algorithm
The way I understand it is in order to represent the string as a unique integer. It looks like this
sum = 0
for character in string
sum = sum * R + ord(character) mod p
Where R is the size of the alphabet and ord() gives the position of the character in the alphabet. And p is some large prime.
From that its pretty trivial that if you appended a new character to the string. You would just run another iteration of that loop and
sum = (sum * R + ord(newCharacter)) mod p
What I am not understanding is how you get the formula for removing a character from the beginning of the string.
sum = sum - ord(characterToRemove) * ((R^len(s) - 1)) mod p)) mod p
where len(s) is the length of the string.
I intuitively see why the ((R^len(s) - 1) mod p)) term is needed.
However could someone help show me how that term is derived. I think its just some gap in my algebra knowledge, that is preventing me from seeing it.
It's easiest to think of this like a number represented in base $R$. Treating each character as its value, you end up with $$\mathrm{ord}(a_1)R^{n-1}+\mathrm{ord}(a_2)R^{n-2}+\cdots+\mathrm{ord}(a_{n-1})R+\mathrm{ord}(a_n)$$ You should convince yourself that this is what the non recursive formula turns out to look like. It's less efficient to calculate it that way, but it's easier to see how to remove the first character.