Lemma: Let $x\in\mathfrak{gl}(V)$ be a nilpotent endomorphism. Then $\operatorname{ad}x$ is also nilpotent.
Proof: We may associate to $x$ two endomorphisms of $\operatorname{End}V$, left and right translation: $\lambda_x(y)=xy, \rho_x(y)=yx$, which are nilpotent because $x$ is. Moreover $\lambda_x$ and $\rho_x$ obviously commute. In any ring (here $\operatorname{End}(\operatorname{End}V)$) the sum or difference of two commuting nilpotents is again nilpotent so $\operatorname{ad}x=\lambda_x-\rho_x$ is nilpotent.
1) Why is the ring there $\operatorname{End}(\operatorname{End}V)$? I think $\lambda_x$ takes $y$ to $xy$, so it takes an element of $GL(V)$ to another element of $GL(V)$. Shouldn't $\lambda_x$ belong to $\operatorname{End}V$?
2) Why is the sum or difference of two commuting nilpotents again nilpotent?
1) The endomorphisms of $V$ take elements of $V$ to elements of $V$; they do not act generally on elements of $GL(V)$. (They can be considered to act on $GL(V)$ but not all endmorphisms of $GL(V)$ arise this way.)
2) Any sufficiently high power say $\ge 2m-1$ of a sum of commuting terms contains (pigeonhole principle) a power of one of them which is large, say $\ge m$.