In "Methods of Nonlinear Analysis: Applications to Differential Equations" (P. Drabek, J. Milota), they present the following construction:
Let $A\in L(X)$, and choose $\lambda\in\sigma(A)$, then set $N_k = \text{ker}(\lambda I-A)^k$. It is obvious that $N_k\subset N_{k+1}$, and they cannot be all distinct. If $N_k = N_{k+1}$ then $N_k = N_i$ for all $i>k$. Denote $n(\lambda)$ the least such $k$ and set $$ N(\lambda) = N_{n(\lambda)},\qquad R(\lambda)=\text{Im}(\lambda I - A)^{n(\lambda)} $$ Then both $N(\lambda)$ and $R(\lambda)$ are $A$-invariant and the decomposition $X = N(\lambda)\oplus R(\lambda)$ holds.
Here $X$ is a vector space, $L(X)$ is the set of linear transformations on $X$ to itself, $\sigma(A)$ is the spectrum of $A$ (the set of eigenvalues of $A$), and an $A$-invariant subspace $S$ of $X$ is one such that $A(S)\subset S$.
What is the intuition behind this construction, if any? I thought at first maybe $n(\lambda)$ corresponded to the multiplicity of the eigenvalue $\lambda$, but really it appears the multiplicity of $\lambda$ only corresponds to the dimension of $\text{ker}(\lambda I - A)$, and has nothing really to do with the $k$th power of $\lambda I - A$. How did they come up with this construction, and what is the intuitive idea to $n(\lambda)$, $N(\lambda)$, and $R(\lambda)$?
Edit: Reading on in the text, the $n(\lambda)$ is the multiplicity of $\lambda$, meaning my interpretation of multiplicity was completely wrong. I thought the multiplicity of an eigenvalue was the number of linearly independent eigenvectors corresponding to that eigenvalue, but apparently not. Instead, it appears to be only the exponent of $(t-\lambda)$ in the characteristic polynomial for $A$. Then my confusion lies in my understanding of multiplicity.
I wanted to type a really long answer, but it's getting late here in Europe and I'm falling asleep somewhat so I'm just giving you a pointer to how to find more info.
First take the result as the theorem for granted. Now let $\mu$ be a different eigenvalue of $A$. It is easy to see that $R(\lambda)$ is mapped to itself by all powers of $(A - \mu I)$ so we can repeat the theorem with $X$ replaced by $R(\lambda)$ and doing this over and over and over we end up with a decomposition
$$X = N(\lambda) \oplus N(\mu) \oplus N(\nu) \oplus ... = \bigoplus_{\lambda' \in \sigma(A)} N(\lambda')$$
Especially in the case where $X$ is finite dimensional is very pleasant to work with because we have no trouble interpreting the big direct sum symbol.
The space $N(\lambda)$ are called the generalized eigenspace at eigenvalue $\lambda$. What is important to realize is that each of the generalized eigen spaces is mapped into itself by $A$ so we can study $A$ by studying its action on each of these spaces separately.
The decompositon of $X$ into generalized eigenspaces is called the Jordan decompostion (at least in the finite dimensional case). If you pick a basis of $X$ where each basis vector lies in a generalized eigenspace (and successive basiselements lie in the same eigenspace) then the matrix representing $A$ becomes a block matrix with only the 'diagonal' blocks are non-zero. (This is a restatement of the fact that $A$ maps the gen. eigenspaces to themselves.) Now if you pick your basis such that these diagonal blocks look really nice, you get the Jordan Cannonical Form of $A$.
I guess that if you google for any of the bold terms you will find some explanation of how people came up with it or what is the intuition. If that fails I will maybe write something about it myself tomorrow.
EDITED IN AFTER READING YOUR EDIT: two more terms to google for geometic multiplicity and algebraic multiplicity. The first is what you thought multiplicity was and the second is what you now found that it is. The fact that people invented words to make discussion of the difference more easy indicates that your idea was not 'completely wrong', just incomplete