I've seen a lot of algorithms reducing the result of a multiplication in a Binary Field by using only bit-shifts and XOR. The number of positions to shift seems to be derived from the polynomial, but I wasn't able to find any proper explanation.
Example 1: Algorithm 2.41 in http://www.springer.com/cda/content/document/cda_downloaddocument/9780387952734-c1.pdf
Example 2: Algorithm 4 in http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/carry-less-multiplication-instruction-in-gcm-mode-paper.pdf
Going through a few examples, I finally understood it. The explanation under 2.3.5 in Example 1 actually explains the basics very well. The main idea is that is that you can reduce every coefficient on it's own. Instead of doing division you simply use the fact that the reduction of every power over your maximal degree $m$ can be computed by $z^a \equiv z^{a-m} * r(z)$; where $r(z)$ is your modulus without the highest power. Since this can lead to degrees higher than $m$, words besides the ones in the result can be effected. So you have to start with the highest word and continue from there.
Regarding the bit-shifts: