Intuition behind Cantor-Bernstein-Schröder

2.8k Views Asked by At

The book I am working from (Introduction to Set Theory, Hrbacek & Jech) gives a proof of this result, which I can follow as a chain of implications, but which does not make natural, intuitive sense to me. At the end of the proof, I found myself quite confused, and had to look carefully at the build-up to see how the conclusion followed. I get the steps, now - but not the intuition.


The authors took sets $X$ and $Y$ and assumed injections $f: X \rightarrow Y$, $g: Y \rightarrow X$. Since $X \sim g[f[X]]$ and $X \supseteq g[Y] \supseteq g[f[X]]$, and $Y \sim g[Y]$, the authors went to prove the lemma that $A \sim A_1, A \supseteq B \supseteq A_1 \implies A \sim B$.

For the lemma, they defined recursive sequences $\{A_n\}_{n \in \omega}, \{B_n\}_{n \in \omega}$ by $A_0 = A, A_{n+1} = f[A],$ $ B_0 = B, B_{n+1} = f[B].$ Since $A_0 \supseteq B_0 \supseteq A_1 \implies f[A_0] \supseteq f[B_0] \supseteq f[A_1]$, we get from induction that $A_{n+1} \subseteq A_n$. Putting $\{C_n\}_{n \in \omega} = \{A_n - B_n\}_{n \in \omega},$ they noted that $C_{n+1} = f[C_n]$ (since $A_n \supseteq B_n$ inductively, $f[A_n - B_n] = f[A_n] - f[B_n]$). Putting

$$ C = \bigcup_{n=0}^\infty C_n, \text{ } D = A - C,$$

they noted that $f[C] = \bigcup_{n=1}^\infty C_n$, and that $h(x): A \rightarrow f[C] \cup D$ can be defined by sending $x \in C \rightarrow f(x),$ and $x \in D \rightarrow x$. It is clear that $h$ is one-to-one and onto.

Inductively, for $n > 0$, $C_0 \cap C_n = \varnothing$; it follows that $C_0 \cap \bigcup_{n=1}^\infty C_n = C_0 \cap f[C] = \varnothing$. We know that $C_0 \cup f[C] \cup D$ = A, and since all three sets are disjoint, we may conclude that $f[C] \cup D = A - C_0 = A - (A - B) = B$. Thus, our bijection $h$ maps $A$ to $B$, as we wanted.


What is the intuition here? What were H&J, or Cantor, Bernstein, etc. thinking when they went to prove this - the "high-level" idea? Is there a clearer proof, in the sense of thought process (not necessarily shorter)?

4

There are 4 best solutions below

5
On BEST ANSWER

$\newcommand{\ran}{\operatorname{ran}}$Here’s one way to think about it. Suppose that $y\in\ran f$; then we can pull $y$ back to $f^{-1}(y)\in X$. If $f^{-1}(y)\in\ran g$, we can pull it back to $g^{-1}(f^{-1}(y))\in Y$. If we continue this pulling back, one of two things must happen: either we reach a dead end at a point of $X$ or $Y$ that can’t be pulled back (because it’s in $Y\setminus\ran f$ or $X\setminus\ran g$), or we don’t.

Let $X_0=X\setminus\ran g$, the set of points of $X$ that cannot be pulled back at all, and let $Y_0=Y\setminus\ran f$. More generally, for each $n\in\omega$ let $X_n$ be the set of points of $X$ that can be pulled back exactly $n$ times, and let $Y_n$ be the set of points of $Y$ that can be pulled back exactly $n$ times. Finally, let $X_\omega$ and $Y_\omega$ by the subsets of $X$ and $Y$, respectively whose points can be pulled back infinitely many times.

At this point a sketch helps; it should show the partitions $\{X_n:n\le\omega\}$ of $X$ and $\{Y_n:n\le\omega\}$ of $Y$, and it should include arrows indicating what parts of $X$ get mapped to what parts of $Y$ and vice versa. To avoid having arrows crossing, I’ve taken $X$ and $Y$ apart in the following diagram.

$$\begin{array}{} X_0&\overset{f}\longrightarrow&Y_1&\overset{g}\longrightarrow&X_2&\overset{f}\longrightarrow& Y_3&\overset{g}\longrightarrow&X_4&\dots&X_\omega\\ Y_0&\overset{g}\longrightarrow&X_1&\overset{f}\longrightarrow&Y_2&\overset{g}\longrightarrow&X_3&\overset{f}\longrightarrow&Y_4&\dots&Y_\omega \end{array}$$

Each of the arrows is a bijection, so I can break up the diagram into $\omega$ self-contained parts. The first two parts are:

$$\begin{array}{} X_0&\overset{f}\longrightarrow&Y_1\\ Y_0&\overset{g}\longrightarrow&X_1 \end{array}\qquad \begin{array}{} X_2&\overset{f}\longrightarrow&Y_3\\ Y_2&\overset{g}\longrightarrow&X_3 \end{array}$$

Ignoring $X_\omega$ and $Y_\omega$ for the moment, I can rearrange the rest of the diagram to give me a bijection from $X\setminus X_\omega$ to $Y\setminus Y_\omega$:

$$\begin{array}{ccc} X_0&\overset{f}\longrightarrow&Y_1\\ X_1&\overset{g^{-1}}\longrightarrow&Y_0\\ X_2&\overset{f}\longrightarrow&Y_3\\ X_3&\overset{g^{-1}}\longrightarrow&Y_2\\ \vdots&\vdots&\vdots\\ X_{2k}&\overset{f}\longrightarrow&Y_{2k+1}\\ X_{2k+1}&\overset{g^{-1}}\longrightarrow&Y_{2k}\\ \vdots&\vdots&\vdots \end{array}$$

Finally, I claim that $f[X_\omega]=Y_\omega$: everything in $X_\omega$ can be pulled back infinitely often, so everything in $f[X_\omega]$ can be pulled back infinitely often, and therefore $f[X_\omega]\subseteq Y_\omega$. On the other hand, if $y\in Y_\omega$, then $y$ can be pulled back infinitely often, so it must be possible to pull $f^{-1}(y)$ back infinitely often, and therefore $f^{-1}(y)\in X_\omega$. Thus, $Y_\omega\subseteq f[X_\omega]$ as well. The diagram above can now be completed to show a bijection from $X$ onto $Y$:

$$\begin{array}{ccc} X_0&\overset{f}\longrightarrow&Y_1\\ X_1&\overset{g^{-1}}\longrightarrow&Y_0\\ X_2&\overset{f}\longrightarrow&Y_3\\ X_3&\overset{g^{-1}}\longrightarrow&Y_2\\ \vdots&\vdots&\vdots\\ X_{2k}&\overset{f}\longrightarrow&Y_{2k+1}\\ X_{2k+1}&\overset{g^{-1}}\longrightarrow&Y_{2k}\\ \vdots&\vdots&\vdots\\ X_\omega&\overset{f}\longrightarrow&Y_\omega \end{array}$$

The bijection is defined piecewise, but that’s no problem.

There are a few details to be filled in to make this a fully rigorous proof, but I think that it does give a reasonable idea of one possible intuition.

Added: Here’s a very rough sketch. Arrows from left to right are (parts of) $f$, and arrows from right to left are (parts of) $g$.

enter image description here

0
On

It is a reasonably long and seemingly complicated proof, but actually the idea is very simple (or at least it is in the proof I've seen). We have injections $f:X\to Y$ and $g:Y\to X$. To build a bijection $h:X\to Y$ we need to send each point in $X$ to a unique point in $Y$, making sure we hit everything.

We can start by saying $h(x) = f(x)$. But of course, this doesn't hit everything if f is not surjective. But , for each element $y$ we don't hit in $Y$ (i.e. the set ${Y\setminus{f(X)}}$) there is a unique element $g(y)$ in $X$ which we can make map to $y$. So we edit $h$ such that now $$h(x) = \begin{cases} g^{-1}(x) & \mbox{if } g^{-1}(x) \notin f(X) \\ f(x) & \mbox{otherwise} \end{cases}$$

So this time we hit everything we didn't get first time from the $g^{-1}(x) $ part. But now we're missing some other parts! (namely the values $f(x)$ where $x \in g(Y\setminus f(X))$ but $f(x) \notin Y\setminus f(X)$.) We can define the sets that we "miss" in each iteration of improving $h$ as $C_n$ and say $C = \displaystyle \bigcup_1^{\infty}C_n$. Then defining $$h(x) = \begin{cases} g^{-1}(x) &\mbox{if } g^{-1}(x) \in C \\ f(x) &\mbox{otherwise} \end{cases}$$

And by thinking of how we built it up, it kind of makes a bit more sense as to why it is a bijection.

Please note that this isn't the most efficient way to carry out the proof, but I explained it in the way that I would come up with it whilst trying to see why it works, rather than what I'd write down formally. Also please note it's been a long time since I've seen the proof and I may have gotten some of the specifics wrong (please anyone correct me if this is the case!), but this is the right general idea (I hope!)

10
On

My favorite proof is due to R. H. Cox, and I learned it from the wonderful Handbook of Analysis and its Foundations by Schechter, from which I've taken the illustration below. The basic idea is that an injection from $Y$ to $X$ identifies $Y$ with a subset of $X$. So we can reduce the problem to showing that if we have $Y\subseteq X$ and $f:X\to Y$ is an injection, then $|X|=|Y|$ and this is less confusing. let $C=X\backslash Y$. The sequence of sets $\big(f^n(C))$ is disjoint since $f(X)\subseteq Y$ and $f$ is injective. Let $S=\bigcup_{n=0}^\infty f^n(C)$. Define $g:X\to Y$ by

$$g(x) = \begin{cases} f(x)&\mbox{if } x\in S \\ x & \mbox{if }x\notin S. \end{cases}$$

It is easily verified that $g$ is a bijection.

Remark: According to Kunen's book on the foundations of mathematics, Dedekind already proved this version of the Cantor-Bernstein theorem in 1887.

enter image description here

0
On

As you pointed out, to prove Cantor-Schroeder-Bernstein theorem, one needs to prove the following lemma:

If $A_1 \subset B \subset A$ and $|A_1|=|A|$, then $|B|=|A|$.

According to the hypothesis, there exists some one-to-one mapping $f$ from $A$ onto $A_1$. So, we need a one-to-one mapping $g$ from $A$ onto $B$. How can we find that?

One may think that we should embed the set $B$ in the set $A$ by inclusion so that we find the required mapping as the inverse of the inclusion mapping, that is, the mapping$$h:A \to B \\ h(x)=x.$$This mapping is one-to-one, but it cannot be defined on the whole set $A$ because for $x \in A-B$ $h(x)=x$ is not contained in the set $B$.

One may think that we should use the mapping $f$ as our required mapping, that is,$$k:A \to B \\ k(x)=f(x).$$This mapping is one-to-one, but it cannot cover the whole set $B$ because the range of the function $f$ is the set $A_1 \subset B$ and there may be some $x\in B$ which is not in the set $A_1$.

As we see there are two extreme approaches for finding the required mapping; the first one covers the whole set $B$, and the second one covers the whole set $A$. So, it is intuitively expectable that to find the required mapping we should use both the mappings (approaches) simultaneously so that both the sets $A$ and $B$ are covered in a one-to-one manner.

But, there exists a problem. If we use both the mappings simultaneously, for example $g: A \to B \quad g(x)=\begin{cases}f(x) & \text{if } x\in C; \\ x & \text{if } x \in A-C \end{cases}$ for some subset $C \subset A$, then we may miss either the onto property or the one-to-one property, because the ranges of the pieces of $g$ may overlap.

So, our original problem is reduced to finding some subset $C \subset A$ such that the ranges of the pieces of the mapping are disjoint and the union of them is equal to the set $B$. Now, how to find such a $C$?

Here is an idea. Since the function $h(x)=x$ cannot be defined on $C_0=A-B$, as explained above, let us map this subset of $A$ by the function $k(x)=f(x)$ into the set $B$. So, we obtain the mapping$$g_0(x)= \begin{cases}f(x) & \text{if } x \in C_0; \\ x & \text{if }x \in A-C_0 \end{cases}.$$But, we have missed the one-to-one property because the ranges of the pieces overlap (In fact, $f[C_0]$ is contained in the range of the second one, since $f[C_0] \subset f[A] \subset A-C_0$).

So, we need to remove the problematic points $C_1=f[C_0]$ from the domain of the second piece (since the domain and the range of the function $h(x)=x$ are the same) to retain the one-to-one property. However, since we need to define the mapping $g$ on the whole set $A$, we need to add such points to the domain of the first piece. So, we obtain the mapping$$g_1(x)= \begin{cases}f(x) & \text{if } x \in C_0 \cup C_1; \\ x & \text{if }x \in A-(C_0 \cup C_1) \end{cases}.$$But, we have missed the one-to-one property because the ranges of the pieces overlap (In fact, $f[C_1]$ is contained in the range of the second one, since $f[C_1]=f^2[C_0] \subset f^2[A] \subset A-(C_0 \cup C_1)$).

So, we need to remove the problematic points $C_2=f[C_1]$ from the domain of the second piece (since the domain and the range of the function $h(x)=x$ are the same) to retain the one-to-one property. However, since we need to define the mapping $g$ on the whole set $A$, we need to add such points to the domain of the first piece. So, we obtain the mapping$$g_2(x)= \begin{cases}f(x) & \text{if } x \in C_0 \cup C_1 \cup C_2; \\ x & \text{if }x \in A-(C_0 \cup C_1 \cup C_2) \end{cases}.$$ $$\vdots \qquad \vdots \qquad \vdots$$ But, we have missed the one-to-one property because the ranges of the pieces overlap (In fact, $f[C_{n-1}]$ is contained in the range of the second one, since $f[C_{n-1}]=f^n[C_0] \subset f^n[A] \subset A-(C_0 \cup C_1 \cup \cdots C_{n-1})$).

So, we need to remove the problematic points $C_n=f[C_{n-1}]$ from the domain of the second piece (since the domain and the range of the function $h(x)=x$ are the same) to retain the one-to-one property. However, since we need to define the mapping $g$ on the whole set $A$, we need to add such points to the domain of the first piece. So, we obtain the mapping$$g_n(x)= \begin{cases}f(x) & \text{if } x \in C_0 \cup C_1 \cup \cdots \cup C_n; \\ x & \text{if }x \in A-(C_0 \cup C_1 \cup \cdots \cup C_n) \end{cases}.$$ $$\vdots \qquad \vdots \qquad \vdots$$ This pattern motivates us to define the mapping $g$ as follows.$$g(x)=\begin{cases}f(x) & \text{if } x \in C; \\ x & \text{if } x \in A-C \end{cases}, \qquad C= \bigcup_{n=0}^{\infty }C_n$$Noting that$f[C]=\bigcup_{n=1}^{\infty }C_n$, we can easily see that the mapping $g$ is one-to-one because each of its pieces is and the ranges of the pieces are disjoint and it is onto the set $B$.


Addendum

Looking at how the $C_n$'s are constructed, one may think that the existence of the set $C$ (and so the proof of the theorem) relies on the existence of some infinite set like $\mathbb{N}$ to be able to define the sets $C_n$'s recursively. However, in this section we show that such a view is not correct.

In fact, to obtain the bijective mapping $g$, we need some sets $C$ such that the values of the function $f$ at the points of $f[C]$ do not lie outside of $f[C]$. The existence of such a set can be guaranteed by applying some fixed-point theorem (Knaster-Tarski Theorem) to some monotone function of sets, as follows.

Let $F: \mathcal{P}(A) \to \mathcal{P}(B)$ be monotone, i.e., if $X \subset Y$, then $F(X) \subset F(Y)$ ($\mathcal{P}(A)$ is the power set of $A$). Consider the set $T= \{ X \subset A \mid F(X) \subset X \}$. It can be easily seen that $\overline{X}=\bigcap T$ is the least fixed point of $F$ (Proof: $A \in T$, so $T \neq \varnothing$ and so $\overline{X}=\bigcap T$ can be defined. Since $F$ is monotone and for any $X \in T$ we have $\bigcap T \subset X$, $F(\overline{X}) \subset F(X)$ for every $X \in T$, so $\overline{X} \in T$.

Since $F$ is monotone and $F(\overline{X}) \subset \overline{X}$, we have $F(F(\overline{X})) \subset F(\overline{X})$, so $F(\overline{X}) \in T$.

However, since $\overline{X} \subset X$ for every $X \in T$, we have $\overline{X} \subset F(\overline{X})$. Thus, $F(\overline{X})=\overline{X}$.

If $F$ has some other fixed points $X'$, i.e., $F(X')=X'$, then $X' \in T$. Since $\overline{X} \subset X$ for every $X\in T$, we conclude that $\overline{X}=\bigcap T$ is the least fixed point of $F$).

Consider the function $F(X)=(A-B)\cup f[X]$. Clearly, it is monotone, so the set $C=\overline{X}$ defined above is its least fixed point.

Now, we can easily see that the mapping $g:A \to B$ defined by$$g(x)=\begin{cases}f(x) & \text{if } x\in C; \\ x & \text{if } x \in A-C \end{cases}$$ is one-to-one and onto the set $B$ (We only need to note that $$\begin{align}f[C] \cup (A-C) & =f[C] \cup (A-((A-B) \cup f[C])) \\ & = f[C] \cup ((A-(A-B)) - f[C]) \\ & =f[C] \cup (B-f[C]) \\ & =B \end{align}$$(Please note that in the above calculation we have used the fact that $f[C] \subset A_1 \subset B \subset A$) and$$\begin{align}f[C] \cap (A-C) & =f[C] \cap (A-((A-B) \cap f[C])) \\ & = f[C] \cap ((A-(A-B)) - f[C]) \\ & =f[C] \cap (B-f[C]) \\ & = \varnothing \end{align}$$(Please note that in the above calculation we have used the fact that $f[C] \subset A_1 \subset B \subset A$)).

Now, the least fixed point of the function $F$ can be obtained recursively as follows.

Clearly the function $F$ is continuous, meaning that for any nondecreasing sequence of subsets of $A$, $\langle X_i \mid i \in \mathbb{N} \rangle$, $X_i \subset X_j$ whenever $i \le j$, we have$$F \left ( \bigcup_{i \in \mathbb{N}}X_i \right ) = \bigcup_{i \in \mathbb{N}} F \left ( X_i \right ).$$Let us define recursively $X_0=\varnothing$, $X_{i+1}=F(X_i)$ and then define $\overline{X}=\bigcup_{i \in \mathbb{N}}X_i$. Clearly, the $\langle X_i \mid n \in \mathbb{N} \rangle$ is a nondecreasing sequence of subsets of $A$. So we have$$\begin{align}F \left ( \bigcup_{i \in \mathbb{N}} X_i \right ) & = \bigcup_{i \in \mathbb{N}}F(X_i) \\ & = \varnothing \cup F(X_0) \cup F(X_1) \cup \cdots \\ & = X_0 \cup X_1 \cup X_2 \cup \cdots \\ & = \bigcup_{i \in \mathbb{N}} X_i. \end{align}$$Thus, $\overline{X}=\bigcup_{i \in \mathbb{N}}X_i$ is a fixed point of $F$.

Now, if $X'$ is another fixed point of $F$, since $F$ is monotone and $\langle X_i \mid i \in \mathbb{N} \rangle$ is a nondecreasing sequence of subsets of $A$, we have$$\varnothing \subset X' \quad \Rightarrow \quad X_1=F(\varnothing ) \subset F(X')=X' \\ X_1 \subset X' \quad \Rightarrow \quad X_2=F(X_1) \subset F(X')=X' \\ \vdots \qquad \vdots \qquad \vdots \\ X_{n-1} \subset X' \quad \Rightarrow \quad X_n =F(X_{n-1}) \subset F(X')=X' \\ \vdots \qquad \vdots \qquad \vdots$$So, $\overline{X}=\bigcup_{i \in \mathbb{N}} \subset X'$. Thus, $\overline{X}$ is the least fixed point of $F$.

Hence, we conclude that the fixed point of the function $F(X)=(A-B) \cup f[X]$, $C$, must be of the form$$\begin{align}C & =(A-B) \cup ((A-B) \cup f[A-B]) \cup ((A-B) \cup f[A-B] \cup f[f[A-B]]) \cup \cdots \\ & = C_0 \cup (C_0 \cup C_1) \cup (C_0 \cup C_1 \cup C_2) \cup \cdots \\ & = \bigcup_{i \in \mathbb{N}}C_n\end{align}$$(Please remember that $f$ is injective), which was already obtained from our original argument.

Therefore, the existence of the set $C$ can be confirmed without needing existence of some infinite set like $\mathbb{N}$. Now, if $A$ is a finite set, then $B$ must be equal to $A$ and so $C=A-B=\varnothing$. But, if $A$ is an infinite set (so the existence of some infinite set has been already assumed in our theory), then the set $C$ is constructed from an infinite chain of sets, as explained above.