Why should it be that the greatest common divisor of a polynomial and it's derivative produce the product of the repeated factors?
What about the case where the derivative is zero in $\mathbb{F}_p$? Why does the approach continue to work in this case?
This is considered a trivial step in most algorithms, and that may be the case, but I'm having trouble finding a proof of the correctness of the basic algorithm described here.
A proof that $f / \gcd(f,f')$ eliminates repeated factors would be appreciated.
If $f = q^k g$ where $q \nmid g$ and $q$ is irreducible then
$$ f' = k q' q^{k-1} g+ q^k g'$$
so the highest power of $q$ dividing $f'$ is $k-1$ (it is not difficult to show that if $q$ is irreducible then $(q,q')= 1$).
So dividing by $(f,f')$ is dividing by every irreducible factor to one power less than what appears in $f$.
EDIT: proof of $(q,q') = 1$ in the finite field case
If $q' \neq 0$ we are done because $q$ is irreducible.
If $q' = 0$, $q = \sum a_i x^{pk}$ where $p$ is the characteristic of our finite field $k$.
Note that $q(x+y) = q(x) + q(y)$. If $q(x_0) = 0$ for some $x_0 \in k$, then $(x-x_0) | q$, contradicting irreducibility of $q$.
Then, by pigeonhole, $q(x_0) = q(y_0)$ for some $x_0, y_0 \in k$. However, then $q(x_0 - y_0) = 0$, reaching a contradiction again.