In the Time Complexity section of this Wikipedia article, it states
In the algorithm as written above, there are two expensive operations during each iteration: the multiplication s × s, and the mod M operation. The mod M operation can be made particularly efficient on standard binary computers by observing that $k ≡ ( k \mod 2^n ) + ⌊ k / 2^n ⌋ ( \mod 2^n − 1 ) $
This says that the least significant n bits of k plus the remaining bits of k are equivalent to k modulo $2^n−1$. This equivalence can be used repeatedly until at most n bits remain. In this way, the remainder after dividing k by the Mersenne number $2^n−1$ is computed without using division.
I am having difficulty understanding what this is saying. Is it presenting a method for calculating the residue that is faster than long division? What is $k$? Is there two different types of "mod" where the first is a binary operator and the second means instead of being equivalent the equation is congruent? I find it hard to understand how one formula can have two "mods".
What do bits have anything to do with this?
This algorithm computes $k \mod M$ where $k$ is any integer (the input) and $M$ is a Mersenne number, ie there is an integer $n$ such that $M = 2^n - 1$.
The bits are just the digits of $k$ expressed in basis $2$. The example in the Wikipedia article you quote illustrates this very well.
Say $M = 2^n -1$. Remember that any int $k$ is stored by the computer as its binary expression $k_2$, and the computer performs all arithmetic computations in basis 2. Hence :
computing $k \mod 2^n$ is very easy : just take the last $n$ digits of $k_2$.
finding $\lfloor \frac{k}{2^n}\rfloor$ is also very quick : just remove the last $n$ digits of $k_2$.
let $x$ be a number whose binary expression $x_2$ has $\leqslant n$ digits, then computing $x \mod (2^n - 1)$ is a piece of cake :
$$ x \mod (2^n - 1) = \left\{ \begin{array}{ll} 0 & \textrm{if } x_2 = \underbrace{1 \dots 1}_{n \textrm{ times}} \\ x & \textrm{otherwise} \end{array}\right.$$