There exists some proof for the theorem. Some of them use Transfinite Recursion. Some of them use same argument with the following.
Proof:(copied from proofwiki)
"Let $S$ be a set.
Let $\mathcal{P}(S)$ be the power set of $S$. By the Axiom of Choice, there is a choice function $c$ defined on $\mathcal{P}(S)-\{∅\}$. We will use $c$ and the Principle of Transfinite Induction to define a bijection between $S$ and some ordinal.
Intuitively, we start by pairing $c(S)$ with $0$, and then keep extending the bijection by pairing $c(S∖X)$ with $α$, where $X$ is the set of elements already dealt with.
Base case $α=0$ Let $s_0=c(S)$.
Inductive step Suppose $s_β$ has been defined for all β<α.
If $S−\{s_β:β<α\}$ is empty, we stop.
Otherwise, define: $s_α:=c(S−\{sβ:β<α\})$.
The process eventually stops, else we have defined bijections between subsets of S and arbitrarily large ordinals."
My question is that why "The process eventually stops".
My attempt: By assuming the process doesn't stop. Then we have a one to one function from $Ord$ to $S$. If you say this is contradict because of $|S|\in Ord$. The idea is not true, since cardinal numbers defines after the theorem and using it. I want to find a contradiction that by using ZFC axioms.
($Ord:$the class of all ordinals)
The process has to stop because of Hartogs theorem:
Or, if you have the axiom of replacement, you can replace $Y$ by an ordinal and obtain the following: