Why does Rudin define $k = \frac{y^n-x}{n y^{n-1}}$ or $h < \frac{x - y^n}{n(y+1)^{n-1}}$ when he tries to prove that every real x has a nth root?

1k Views Asked by At

tl;dr (Too Long To Read):

What is the intuition/conceptual idea to why Rudin used the number:

$$ k = \frac{y^n-x}{n y^{n-1}} $$

in his proof and not some other number? It seems that that number is not random, so how could he have come up with it?


My attempt:

I was trying to prove theorem 1.21 from Rudin's real analysis book on my own without looking at Rudin's proof. The way I attempted to prove it was to try to prove what seemed true from my already collected intuition on the real numbers before I started to study real analysis. So I drew lots of picture which lead me to think of defining $E = \{ \bar y \in R_{>0} : \bar y < x^{1/n} \}$:

Thus, I knew that $E = \{ \bar{y} \in R_{>0} : \bar y^n < x \}$ is just the same as $E = \{ \bar y \in R_{>0} : \bar y < x^{1/n} \}$. So it was obvious that what I had to show was that the $\alpha = supE = y = x^{1/n}$ (also the reason for defining sets like that is cuz we probably need to use the Least Upper Bound (LUB) property cuz its one of the few things we are suppose to know about analysis so far). Thus, I proceed to show $E$ is bounded and non-empty so that I was guaranteed that the sup existed since we assumed the Least Upper Bound (LUB) property (a.k.a. the completeness axiom).

Then I thought I want $y = \alpha$ so one option was to try showing $y < \alpha$ AND $y > \alpha$ are false, so by trichotomy $y=\alpha$. Since I didn't have direct access to $y$ I decided to take the same strategy except with $y^n = x$ instead of $y$ and $\alpha^n$ instead of $\alpha$. Intuitively I thought well lets assume $\alpha^n < x$ and $ x < \alpha^n $. The first one is too small so hopefully it should lead to some contradiction and perhaps show $\alpha < y$ is false. Similarly the other one $ x < \alpha^n $ should be too large somehow. Then maybe we can use trichotomy to get $\alpha^n = x$ which completes the proof.

I attempted the first one $\alpha^n < x$. The only other thing I knew about $\alpha$ was that $\forall \bar y \in E, \bar y^n < x$. Then I decided to combine both equation (since I was hinted to use $a^n - b^n$ cuz I saw that identity in the soln when I check for my solution that E was bounded and non-empty):

$$ \alpha^n - \bar y^n < x - \bar y ^n < 0$$

then because of the hint (that I probably wouldn't had realized I needed to use) I was extremely lucky and decided to subtract $\bar y^n$ from both sides (notice that if I would have decided to subtract by $\alpha^n$ it wouldn't have worked):

$$ \alpha^n - \bar y^n = (\alpha - \bar y)\left(\sum_{0\leq i+j \leq n-1} \bar y^i \alpha^j \right) = (\alpha - \bar y)K < 0$$

where I noticed that $K > 0$ since every element of E is greater than zero and so is its least upper bound $\alpha$. Thus $K > 0$ and getting rid of it gets me:

$$ (\alpha - \bar y) < 0 \implies \alpha < \bar y$$

which is obviously false since that would imply that $ \alpha$ is not an upper bound. Thus $\alpha^n < x$ is false.

Now assume $x < \alpha^n $. One can't use the same argument as in my previous attempt because this time we are trying to create an element that is an upper bound smaller than $\alpha$ and its not clear how elements form $E$ are useful.

Its clear to me we have to choose an $h$ such that:

$$ x < (\alpha - h)^n < \alpha^n$$

I have given it a proper attempt (see at the end of my question) but I am unable to prove the desired result no matter how much I play with the algebra and the known facts I have.

Thus, my question is how did Rudin come up with the following:

$$k = \frac{y^n-x}{n y^{n-1}}$$

from his explanation it seems it just came out of a hat. I am sure if I plugged it in I would see that "it works" however, I wanted to know/see how to come up with it myself.

Similarly I don't see how/why he came up with this one:

$$h < \frac{x - y^n}{n(y+1)^{n-1}}$$

it seems that its not even required considering my first proof/argument, but I assume it must use the same idea considering it seems it used the same identity $b^n - a^n = (b-a)(b^{n-1}+b^{n-2}a+ \cdots + b a^{n-2} + a^{n-1})$.

Anyone care to share what is the trick I missed? Is there some way to understand how one would have come up with using that? Is there some conceptual idea for the proof that he did not make explicit that I missed?

I am hoping to get a more satisfying proof than just feeling I played around with symbols until I forced the paper to tell me the truth. It seems I missed some insights because even with the hints (like using the identity) didn't yield me a solution.


What I tried:

If we have:

$$ x < \alpha^n$$

then at least intuitively, that must imply that there must be some element $y_{BAD}$ smaller than our supremum $\alpha$ that is still an upper bound (this intuition is because we are going under the assumption that $\alpha^n = x \iff \alpha = x^{1/n}$). Therefore it seems reasonable to try to decrease $\alpha$ the right amount $h$ such that:

$$ x < (\alpha - h)^n < \alpha^n $$

then since $h$ is some distance that we go down from $\alpha$ we probably don't need to go down further than $\alpha$ so it seems reasonable to require $0 < h < \alpha$. With that we have using algebra:

$$ x < (\alpha - h)^n = (\alpha - h)\left( \alpha^{n-1}+\alpha^{n-2}h + \dots + \alpha h^{n-2} + h^{n-1} \right) = (\alpha - h) K < \alpha^n $$

the reason we did that factorization is so that we can hopefully get some inequality for $h$ (intuitively, think that we are trying to make $h$ the subject so that we can choose the right one to get the contradiction we need). Therefore lets try to remove all the nasty exponents with $h$ by assuming $ h < \alpha$ (otherwise $\alpha$ decreases by too much):

$$ K = \alpha^{n-1}+\alpha^{n-2}h + \dots + \alpha h^{n-2} + h^{n-1} < n \alpha^{n-1}$$

so lets plug it into:

$$ x < (\alpha - h) K < \alpha^n $$

after plugging the inequality and leaving $h$ alone and some algebra I skipped I got:

$$ \alpha - \frac{ \alpha^n }{K} < h < \alpha - \frac{x}{n \alpha^{n-1}}$$

unfortunately I don't I didn't manage to plug in $K < n \alpha^{n-1}$ successfully to both sides so I got stuck with the above (which is still in terms of $h^j, j>1$) which still unfortunately has high order terms for $h$...so close it feels... (note I've also tried more things but it would be ridiculous to put it all in here).

so I feel got some of the main insights:

  1. Require the constraint $x < (\alpha - y)^n < \alpha^n$ using $x<\alpha^n$ and $h>0$.
  2. use $(\alpha - h)^n = (\alpha - h)\left( \alpha^{n-1}+\alpha^{n-2}h + \dots + \alpha h^{n-2} + h^{n-1} < n \alpha^{n-1} \right)$ to get $h$ alone.
  3. Use the upper bound on $K < n \alpha^{n-1}$ to remove the higher order terms of $h$ that are annoying (since we are assuming we don't know how to take roots). i.e. (\alpha - h)K < (\alpha - h)n \alpha^{n-1}

those seem the main ingredients but when I tried putting them together it seemed I was still missing something because playing with the algebra didn't lead to the answer. Someone know what it is?

4

There are 4 best solutions below

2
On BEST ANSWER

First, Rudin is using $n\gt0$, and the inequality:

$$b^n-a^n\lt (b-a)nb^{n-1}$$

only holds for $n\in\mathbb{Z}, n\gt1$.

But this isn't massively important here.

We are trying to find a contradiction to $y^n\lt x$, where $x$ is given and $y=\sup E$.

A good place to begin is with a $y+h$, $h$ arbitrarily small and positive. In which case we get:

$$(y+h)^n-y^n \lt hn(y+h)^{n-1}$$

by the above inequality (for $n\gt1$!)

As $h$ is arbitrarily small, we can say $h\lt 1$, and so we continue:

$$(y+h)^n-y^n \lt hn(y+h)^{n-1}\lt hn(y+1)^{n-1}$$

We now use the fact that we are trying to find a contradiction, $(y+h)^n\lt x$ for example, which gives us the contradiction Rudin then uses.

So we need:

$$hn(y+1)^{n-1}\lt x-y^n$$

or:

$$h\lt \frac{x-y^n}{n(y+1)^{n-1}}$$

We can see that:

$$0\lt \frac{x-y^n}{n(y+1)^{n-1}}$$

and so we are free to pick $0\lt h\lt 1$, as required.

$k=\dfrac{y^n-x}{ny^{n-1}}$ is similar, except for $k$ is already fixed, and $k\lt y$ by the definitions given.

0
On

This doesn't quite deserve a whole answer, but I don't have enough reputation to comment. Your disproof of the claim that $\alpha^n < x$ isn't right. In particular, the following is wrong: $\alpha^n - \bar{y}^n < x - \bar{y}^n < 0$. You claim that this is the case for all $\bar{y} \in E$, but why? By assumption, $\bar{y}^n < x$ for all $\bar{y} \in E$, so in fact $x - \bar{y}^n > 0$. This is the line from which your whole proof stems, so it is a crucial mistake. Of course, this doesn't completely answer your question, but that is at least one reason that one needs to be a bit more clever about a proof.

0
On

I will only prove the contradiction for the case $x < y^n$. Also, notice that my attempts were full of ridiculous algebra mistakes (like using the identity $b^n - a^n$ incorrectly to bound $(y - h)^n$, which is wrong since $(y - h)^n \neq y^n - h^n$ (using the binomial was the correct expansion if any...).

Anyway, define $y = \sup E$ and consider $x < y^n$. We know "intuitively" that $E = \{ y \in R_{>0} \mid y^n < x\} = \{ y \in R_{>0} \mid y < x^{\frac{1}{n}} \}$, except that we have not yet defined "nth" roots. So its clear that the supremum (least upper bound) of $E$ is $x^{\frac{1}{n}}$, at least intuitively. Thus $x < y^n$ has to be too large i.e. this (wrong) assumption should lead for us to deriving some element $y-h$ s.t. $x < (y-h)^n < y^n \iff x^{\frac{1}{n}} < y-h < y $ except that the RHS of the iff has not been defined rigorously yet.

At least for me intuitively it seems reasonable to consider the difference $(y-h)^n - y^n$ and see what we can do with it. The first thing is to remember that dealing with terms like $h^i,i>n$ is not helpful because we don't know how to take roots. So the idea is to try to remove them and hope to be able to derive some $h$ by itself with the properties we want. The first thing is to try a non-trivial small example $n=2$ thus we have $a^2 -b^2 = (a-b)(a+b)$ if $a = y-h, b=y$ we get:

$$a^2 -b^2 = (a-b)(a+b) = (y-h)^2 -y^2 = (y-h -y )(y-h+y)=-h( y-h+y)$$

this is crucial because it seperates $h$ from its other $h$ terms. Now we just have to bound either of the products and hope we can satisfy what we need. If we generalize this idea we have:

$$ (y-h)^n -y^n = (y-h -y )((y-h)^{n-1} + (y-h)^{n-1} y + \cdots + y^{n-1} ) = -h ((y-h)^{n-1} + (y-h)^{n-1} y + \cdots + y^{n-1} )$$

this is really crucial because somehow we manage to get $-h$ on its own so now if we just try to bound the other terms in the sum and get rid of the higher order $h$'s then we won't need to take nth roots (which we have not defined yet). For convenience let $k = ((y-h)^{n-1} + (y-h)^{n-1} y + \cdots + y^{n-1} ) $. So at this point we need to hope we can get $y - h < y$ so lets go ahead and assume it (or require it to be true). If its true then:

$$ ((y-h)^{n-1} + (y-h)^{n-1} y + \cdots + y^{n-1} ) = k < n y^{n-1}$$

if we require (and hope) $h>0$ then if we multiply $-h$ by both sides we get:

$$ (y-h)^n -y^n = -h k > -h n y^{n-1}$$

now at this point we realize we need this whole thing to satisfy $x < (y-h)^n < y^n $ or equivalently $x - y^n < (y-h)^n - y^n$ so thus we require:

$$ x - y^n < -h n y^{n-1} < -h k = (y-h)^n -y^n $$

$$ x - y^n < -h n y^{n-1} \implies \frac{x - y^n}{n y^{n-1}} < -h \iff \frac{y^n - x}{n y^{n-1}} > h$$

now note the denominator is positive $n>0, y > 0$ and by assumption $x < y^n \iff y^n - x >0 $ so the numerator is also positive. Meaning that its easy to require $h>0$ which essentially completes the proof. So $x<y^n$ is false and since $y^n < x$ is also false by trichotomy $y^n = x$. So the LUB of $E$ is the nth root of $x$.

Note that you can check it we you want just to be sure. $0< y-h < y$ since $h>0,y>0$ which means $y-h$ is smaller than the smallest upper bound. Furthermore it satisfies $ x - y^n < -h n y^{n-1} < -h k = (y-h)^n -y^n $ so if we $ x - y^n = \frac{x - y^n}{n y^{n-1}} n y^{n-1} < (y-h)^n -y^n \implies x - y^n < (y-h)^n -y^n \implies x < (y-h)^n $. This means that $y-h$ upper bounds $E$. This means $y-h$ upper bounds $E$ and its smaller than $y = \sup E$ which means $y$ can't be a supremum since $y$ is defined to be the least upper bound (LUB). Thus we get a contradiction.

0
On

[This is along the lines of Rudin's proof, hopefully a bit more explicit and clear. From Shilov's "Real and Complex Analysis"]

Th: Let $a \gt 0$ and $n \in \mathbb{Z} _{\gt 0}.$ There exists a unique real $x \gt 0$ with $x ^n = a.$
Pf: [Uniqueness] For any reals $x, y \gt 0$ we have ${ x \lt y \iff x ^n \lt y ^n}.$

Because ${ (y ^n - x ^n) = (y - x) \underbrace{(y ^{n-1} + y ^{n-2} x + \ldots + y x ^{n-2} + x ^{n-1})} _{\gt 0} }$

So two reals ${0 \lt x _1 \lt x _2}$ with ${x _1 ^n = x _2 ^n = a}$ cant exist.

[Existence] Set ${ S := \lbrace x \in \mathbb{R} _{\geq 0} : x ^n \lt a \rbrace }$ contains $0,$ and is bounded above.

If ${ a \in [1, \infty) }$: For any $x \in S,$ ${ x ^n \lt a \leq a ^n }$ giving ${ x \lt a }.$ So $a$ is an upper bound.
If ${ a \in (0,1) }$: For any $x \in S,$ ${ x ^n \leq a \lt 1 ^n }$ giving $x \lt 1.$ So $1$ is an upper bound.

So take ${ s := \sup(S) }.$

Is $s ^n \lt a$ ? Suppose it were. Now
${ \begin{aligned} (s+t) ^n &= s ^n + \binom{n}{1} s ^{n-1} t + \ldots + \binom{n}{n-1} s t ^{n-1} + t ^n \\ &{\color{green}{\leq}} \text{ } s ^n + t \left( \binom{n}{1} s ^{n-1} + \ldots + \binom{n}{n-1} s + 1 \right) \\ &= s ^n + t ( (1+s) ^n - s ^n ) \\ &{\color{purple}{\lt}} \text{ } a, \end{aligned} }$
for ${ \color{green}{t \in (0,1)} }$ and ${ {\color{purple}{t \lt \frac{a - s ^n}{(1+s) ^n - s ^n} }}. }$
So $(s+t) ^n \lt a$ for all ${ t \in (0, \min(1, \frac{a - s ^n}{(1+s) ^n - s ^n}) ) }.$ Especially there are points $\gt s$ in $S,$ absurd.

Is $s ^n \gt a$ ? Suppose it were. Now
${ \begin{aligned} (s - t) ^n &= s ^n + \binom{n}{1} s ^{n-1} (-t) + \ldots + \binom{n}{n-1} s (-t) ^{n-1} + (-t) ^n \\ &{ \color{green}{\geq}} \text{ } s ^n - t \left( \binom{n}{1} s ^{n-1} + \ldots + \binom{n}{n-1} s + 1 \right) \\ &= s ^n - t((1+s) ^n - s ^n ) \\ &{\color{purple}{\gt}} \text{ } a, \end{aligned} }$
for ${ \color{green}{t \in (0,1)} }$ and ${ {\color{purple}{t \lt \frac{s ^n - a}{(1+s) ^n - s ^n} }} .}$
So $(s-t) ^n \gt a$ for ${ t \in (0, \min(1, \frac{s ^n - a}{(1+s) ^n - s ^n}) ) }.$ Especially there is a $\delta \gt 0$ such that there is no point of $S$ in $(s - \delta, s],$ absurd.

So $s ^n = a.$