When implementing multiplication in binary finite fields, Barrett reduction allows to perform division with two carryless multiplications and two XOR additions (and some one-time precomputation). However, because it is always possible to find irreducible polynomials that have a lot of zeros after the first bit (e.g. https://www.hpl.hp.com/techreports/98/HPL-98-135.pdf), there is a very simple way to group long division also into two carryless multiplications and two XOR additions and a bitshift. It is so obvious that I am sure it exists already and I want to know if if has a name.
The algorithm is as follows.
Let us compute a mod m with:
a = 1101 1101 1011 1001
m = 1 0001 1101
where m is the Conway polynomial of degree 8 and a is 16 bits long (carryless product before reduction of 8-bit bitstrings will always be shorter than 16 bits).
Long division would have us shift m to the front of a and use it to XOR away the leading bit of a and do this 8 times.
However, we can group these steps into two groups, for this let's call t (for "tail") the bitstring obtained by removing the leading bit of m:
t = 1 1101
then let's also give names to the four groups of four bits of a:
a = 1101 1101 1011 1001 = x y w z
then, with * denoting carryless multiplication and + XOR addition, we obtain a mod m in two steps by first XORing t*x to y w z:
(y w z) + (t*x 0000) = y' w' z
(1101 1011 1001) + (1000 0001 0000) = 0101 1010 1001
(the 0000 padding is the bitshift) and then doing the same with y':
a mod m = (w' z) + t*y'
= 1010 1001 + 0110 1001
= 1100 0000
In general, the two steps might have different lengths, but the main point is that because it is always possible to find irreducible polynomials with very short "tails",
it is always possible to choose the irreducible polynomial so that the reduction is done in two steps (I believe even with Conway polynomials instead of polynomials with the shortest tails). For the general case of a mod m with arbitrary bitstrings more steps may be needed; in particular if m = 11... then this method is just long division.
Does this algorithm have a name? And any reason why Barrett reduction may be favored over this algorithm? (Barrett reduction seems to just have less bitshifts)