I was reading about older definitions of the natural numbers on Wikipedia here (in retrospect, not the best place to learn mathematics) and came across the following definition for the natural numbers: (paraphrased)
Let $\sigma$ be a function such that for every set $A$, $\sigma(A) := \{ x \cup \{ y \} \mid x \in A \wedge y \notin x \} $. Then $\sigma(A)$ is the set obtained by adding any new element to all elements of $A$. Then define $0 := \{ \emptyset \}$, $1 := \sigma(0)$, $2 := \sigma(1)$ et cetera.
The way I understood this definition is that the natural number $n$ is "defined" as the set of all sets with exactly $n$ elements. This sounded straightforward to me, until I read the next paragraph:
This definition works in naive set theory, type theory, and in set theories that grew out of type theory, such as New Foundations and related systems. But it does not work in the axiomatic set theory ZFC and related systems, because in such systems the equivalence classes under equinumerosity are "too large" to be sets. For that matter, there is no universal set V in ZFC, under pain of the Russell paradox.
Why exactly doesn't this definition work in ZFC? I don't fully understand how the sets in this definition are "too large". Is part of the problem just that there is no "universal set" to pick the element $y$ from?
I tried to do some more reading to find my answer, but the material was way out of my depth. (I am only familiar with the basics of set theory, Russell Paradox, Cantor diagonal argument, and not much more. ) So I apologize in advance if this is a really simple question...
Notice that, under that definition, we have $$ \begin{split} 1 = \sigma(0) &= \sigma(\{\emptyset\}) = \{x \cup \{y\} : x \in \{\emptyset\} \land y \not \in x\} \\ &= \{ \emptyset \cup \{y\} : \text{y is a set}\}\\ & = \{ \{y\} : y \text{ is a set}\}\\ &= \{ z : |z| = 1\}. \end{split} $$
In ZFC, the collection of all sets ("$V$") does not form a set, so the definition breaks down already at stage $1$. If $\sigma(0)$ was a set then $V = \{y : \{y\} \in \sigma(0)\}$ would also be a set. So that is really the technical difficulty. Frege and Russell proposed that the number $1$ could be defined to consist of all $1$-element sets, but that collection of sets is not itself a set in ZFC.
The usual way of describing why $V$ is not a set is that it is "too large"; this sense of "largeness" is one of the more common ways of motivating the ZFC axioms, so the Wikipedia author alluded to it.
The idea of "largeness" is really an allusion to the "cumulative hierarchy" vision of set theory. Unfortunately, the cumulative hierarchy is hard to describe in one sentence, because it depends already on the notion of ordinal. But the idea is that we can form a collection of sets stage by stage, so that the powerset of each set is formed at the next stage after the set is formed, and so that all the members of each set are formed at stages strictly before the set itself is formed.
One way to understand the ZFC axioms is that they are only trying to describe the sets that are formed via this process. But $V$ cannot be formed at any stage, because it would have to already contain its powerset, but the powerset ought to be formed at the next stage. So the claim that $V$ is too "large" really means that $V$ could not be formed at any stage of the process.
Back to defining the numbers. We can imagine that two sets have the same cardinality if there is a bijection between them. This is an equivalence relation, so it ought to have equivalence classes. And the equivalence class of $\{\emptyset\}$ will consist of every set that has exactly one element. That is the idea behind the definition above. But these equivalences classes are not sets in the cumulative hierarchy, so ZFC has trouble with them.
The way that we usually circumvent this kind of problem in ZFC is to select a "particular" representative from each equivalence class. Then, instead of referring to the entire equivalence class, we refer just to that representative. The most commonly use set of representatives in ZFC are the von Neumann ordinals. So we have $$ \begin{split} 0 &= \emptyset\\ 1 &= \{\emptyset\} = \{0\}\\ 2 &= \{\emptyset, \{\emptyset\}\} = \{0,1\}\\ 3 &= \{0,1,2\} \end{split} $$ and so on. This is not really much different than the definition due to Frege and Russell, as you can see.