Proof Evaluation for Question 16 a on Chapter 1 of Baby Rudin

652 Views Asked by At

I'm self-learning Real Analysis through Rudin's Principles of Mathematical Analysis classical book. I've been stuck on this one question for a few weeks (not constantly of course). I had a few failed proof attempts and I had to learn some basic Linear Algebra since I don't have any background on the subject. It's crazy how much you can learn from one question from this legendary book, especially when you don't have time limitations.

So, I'm not sure my proof is valid. I would like to hear the opinion of someone more experienced. Also, could I've done some things more elegantly? Is it normal that this question was so hard for me? And, do you have some remarks on my style?

The Question:

Suppose $k \geq 3$, $\vec{\textbf{x}}, \vec{\textbf{y}} \in \mathbb{R}^k$, $\left\lVert \vec{\textbf{x}} - \vec{\textbf{y}} \right\rVert = d > 0$, and $r > 0$. Prove: If $2r > d$, there are infintely many $\vec{\textbf{z}} \in \mathbb{R}^k$ such that $\left\lVert \vec{\textbf{z}} - \vec{\textbf{x}} \right\rVert = \left\lVert \vec{\textbf{z}} - \vec{\textbf{y}} \right\rVert = r$.

My Proof:

We show there are infinitely many $\vec{\textbf{z}^*} \in \mathbb{R}^k$, such that $\left\lVert \vec{\textbf{z}^*} \right\rVert = r^2 - \frac{d^2}{4}$, and $\vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}})$. For every $a \in \mathbb{R}$, $0 < a < \frac{1}{2}$, we can construct a $\vec{\textbf{z}^*}$, in the following way:

Let $i, j, t, p \in \mathbb{N}$, $1 \leq i, j, t, p \leq k$. $\vec{\textbf{z}^*} = (z_1, z_2, z_3, ..., z_k)$, where $z_j = b\sqrt{l}$, $z_t = a\sqrt{l}$ and $z_p = c\sqrt{l}$, such that $j$ is the index number of the $j$'th component of $\vec{\textbf{x}} - \vec{\textbf{y}}$, so that $|x_j-y_j| \geq |x_i - y_i|$, for every possible $i$, and $t \neq j, p$, $p \neq j$. For every $z_i$, with $i \neq j, t, p$, we put $z_i = o$. $c = -\sqrt{1-a^2-b^2}$ and $l = r^2 -\frac{d^2}{4}$. Let $\alpha = (x_j - y_j)^2 + (x_p - y_p)^2$, $\beta = 2a(x_j - y_j)(x_t - y_t)$ and $\gamma = a^2[(x_t - y_t)^2+(x_p - y_p)^2] - (x_p - y_p)^2$. We then define $$b = \frac{\sqrt{\beta^2 - 4\alpha\gamma} -\beta}{2\alpha}$$ We show $b$ always exists. $2\alpha = 2[(x_j - y_j)^2+(x_p - y_p)^2] > 0$, since $(x_j - y_j)^2$ must be positive ($d > 0$). Therefore the donuminator is not zero. \begin{equation} \begin{aligned} \beta^2 - 4\alpha\gamma = 4a^2(x_j - y_j)^2(x_t - y_t)^2 - 4\left[(x_j - y_j)^2+(x_p - y_p)^2\right]\left[a^2\left[(x_j - y_j)^2+(x_p - y_p)^2\right]-(x_p - y_p)^2\right]\\ = 4\left[a^2(x_j - y_j)^2(x_t - y_t)^2-\left[a^2(x_j - y_j)^2(x_t - y_t)^2+a^2(x_j - y_j)^2(x_p - y_p)^2+a^2(x_t - y_t)^2(x_p - y_p)^2+a^2(x_p - y_p)^4-(x_j - y_j)^2(x_p - y_p)^2-(x_p - y_p)^4\right]\right]\\ =4\left[(x_p - y_p)^4-a^2(x_p - y_p)^4+(x_p - y_p)^2(x_j - y_j)^2-a^2(x_p - y_p)^2(x_j - y_j)^2-a^2(x_p - y_p)^2(x_t - y_t)^2\right]\\ =4(x_p - y_p)^2\left[(x_p - y_p)^2(1-a^2)+(x_j - y_j)^2(1-a^2)-a^2(x_t - y_t)^2\right]\\ =4(x_p - y_p)^2\left[(1-a^2)\left[(x_p - y_p)^2+(x_j - y_j)^2\right]-a^2(x_t - y_t)^2\right] \geq 0 \end{aligned} \end{equation} , since $(x_j - y_j)^2 > (x_t - y_t)^2$ and $(1-a^2)^2 > a^2$. Thus $b$ is indeed defined. Then, we get $\left\| \vec{\textbf{z}^*} \right\|^2 = a^2l + b^2l + c^2l = l(a^2+b^2+1-a^2-b^2) = l = r^2 - \frac{d^2}{4}$. By the quadratic formula, we have $\alpha b^2 + \beta b + \gamma = 0$. Hence: \begin{equation} \begin{aligned} b^2(x_j-y_j)^2 + b^2(x_p-y_p)^2 + 2ab(x_j-y_j)(x_t-y_t) + a^2(x_t-y_t)^2 +a^2(x_p-y_p)^2-(x_p-y_p)^2 = 0\\ a^2(x_t-y_t)^2+2ab(x_t-y_t)(x_j-y_j)+b^2(x_j-y_j)^2-(x_p-y_p)^2(1-a^2-b^2) = 0 \end{aligned} \end{equation} Therefore \begin{equation} \begin{aligned} a^2(x_t-y_t)^2+2ab(x_t-y_t)(x_j-y_j)+b^2(x_j-y_j)^2 = (x_p-y_p)^2(1-a^2-b^2)\\ \left[a(x_t-y_t)+b(x_j-y_j)\right]^2 = (x_p-y_p)^2(1-a^2-b^2)\\ \end{aligned} \end{equation} then taking the square roots from both sides we get $a(x_t-y_t)+b(x_j-y_j) = (x_p-y_p)^2(1-a^2-b^2)$, hence $\sqrt{l}\left[a(x_t-y_t)+b(x_j-y_j)+c(x_p-y_p)\right] = 0$, thus $\vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}})$.

Since there are infinitely many $0<a<\frac{1}{2}$, there are infinitely many $\vec{\textbf{z}^*}$.

Now, for every $\vec{\textbf{z}^*}$, define $\vec{\textbf{z}} = \vec{\textbf{x}} - \frac{1}{2}(\vec{\textbf{x}}-\vec{\textbf{y}}) + \vec{\textbf{z}^*}$. Then we get $$\left\| \vec{\textbf{z}} - \vec{\textbf{x}} \right\| = \sqrt{\left(-\frac{1}{2}\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right) + \vec{\textbf{z}^*}\right)^2} = \sqrt{\frac{\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right)^2}{4} -\vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}}) + \left\lVert \vec{\textbf{z}^*} \right\rVert^2} = \sqrt{\frac{d^2}{4} - 0 +r^2 - \frac{d^2}{4}} = r$$ It's clear that $\vec{\textbf{z}} = \vec{\textbf{y}} - \frac{1}{2}(\vec{\textbf{y}}-\vec{\textbf{x}}) + \vec{\textbf{z}^*}$, hence we get $$\left\| \vec{\textbf{z}} - \vec{\textbf{y}} \right\| = \sqrt{\left(\frac{1}{2}\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right) + \vec{\textbf{z}^*}\right)^2} = \sqrt{\frac{\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right)^2}{4} + \vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}}) + \left\lVert \vec{\textbf{z}^*} \right\rVert^2} = \sqrt{\frac{d^2}{4} + 0 +r^2 - \frac{d^2}{4}} = r$$

Thus, there are infintely many $\vec{\textbf{z}} \in \mathbb{R}^k$ such that $\left\lVert \vec{\textbf{z}} - \vec{\textbf{x}} \right\rVert = \left\lVert \vec{\textbf{z}} - \vec{\textbf{y}} \right\rVert = r$.

2

There are 2 best solutions below

4
On BEST ANSWER

I would like to hear the opinion of someone more experienced.

I'm also self studying Rudin at the moment, so leave it up to you to decide, if I'm experienced enough to answer you.

Also, could I've done some things more elegantly? ... And, do you have some remarks on my style?

I think I understand the idea of your proof, but still I find it a bit hard to follow. For example, to me it is somewhat confusing that your construction of $\mathbf{z}^*$ uses $b$ and $l$ that are defined later in your proof. My reading of your proof stops when I realize that I do not know definition of these symbols. Then I scan first backwards (in case I missed some definitions) and only after that I try to find them elsewhere. So maybe defining all necessary symbols required in construction of $\mathbf{z}^*$ before actually constructing $\mathbf{z}^*$ would make the proof more readable. (And working that way may also reduce risk of doing some circular reasoning and / or make it easier to detect, since definitions and logic of the proof flows to one direction only.)

Also, you write

Let $i,j,t,p\in\mathbb{N}$, $1≤i,j,t,p≤k$.

At least how i read this, you kind of fix $i$,$j$,$t$ and $p$ to be some integers from the given interval. And then later in some following sentence you start placing further restrictions on those integers and $i$ is not actually having any fixed value but it is rather "any other" integer than $j$,$t$ or $p$. So there is some confusion. (Of course, some of the confusion might be due to the fact that English is not my native language.)

Expressions of form $(x_k-y_k)$ take a considerable amount of space in some parts of your proof. Maybe using some more compact notation like $w_k=(x_k-y_k)$ would make your proof look simpler and shorter. This way possible errors are easier / faster to spot and you also will have less multi-line expressions (which helps readability also).

You can also consider elegance from a broader perspective. For example, if you consider parts (b) and (c) of the same problem 1.16, how easy it is to reuse your results to do the rest of the problem or are you forced to go through some lengthy derivations once again? For example, when I did this problem, I first proved (without using $2r>d$ given in part (a)) that $\mathbf{z}$ is a solution of the given equation if and only if $\mathbf{z}$ can be written in form $\mathbf{z}=\frac{1}{2}(\mathbf{x}+\mathbf{y})+\mathbf{w}$, where $\mathbf{w}\in\mathbb{R}^k$ is such that $\mathbf{w}\cdot(\mathbf{y}-\mathbf{x})=0$ and $|\mathbf{w}|^2=r^2-\frac{1}{4}d^2$. That result can be reused in parts (b) and (c) where an exact amount solutions is needed.

So, I'm not sure my proof is valid.

In your construction, you set $c=-\sqrt{(1-a^2-b^2)}$. How do you know that $c$ is a real number? That seems to be case though and (at least partial) answer to that can be found much later in your proof. But (at least to me) it is not directly evident from the way you define b.

But in general, I think your idea / thinking behind the proof seems to be ok (if we put aside those readability issues I described above). Although, I have to admit that I read through some parts of your derivations bit too fast, so there might be some errors I did not spot. But at least those that I checked seemed to be ok.

Is it normal that this question was so hard for me?

That depends on a definition of normal. At least for me, there seems to be some easier and then some much harder problems in Rudin's book. Some problems have taken days to solve. And in chapter 2 there is a problem that I have not yet solved even I have invested a reasonable amount of time to it. (I am now working my way through chapter 3, but I think I will return to that problem later.)

Have you checked George Bergman's supplementary notes for Rudin's book? In addition to some good extra problems, there are difficulty codes for problems too. While assessing a difficulty level for a problem is somewhat subjective thing (depends strongly on background of the person evaluating the difficulty level), I think those codes in Bergman's notes have pretty much correlated (with some exceptions) with my personal experience of relative difficulty levels of problems in Rudin's problem sets.


Edit: Added answer to a question presented in comments.

It seems like we pretty much had the same idea, so I'm very curious how did you proved that there are infinitely many $\mathbf{w}$'s? This part of the proof was the hardest for me.

Ok, I'll try to explain my solution and thinking in more detail.

The basic idea in my proof was based on geometric intuition: The set of all solutions seems to be a (hyper)circle on a (hyper)plane. I first thought of constructing some part of that circle directly (as you did) but pretty soon that turned out to be a bit messy. So, I looked for another way to solve this. The next idea was to take a line from the plane and then project it to the circle. That would give me an infinite subset of the set of all solutions.

That approach had three subproblems: (1) find two points /vectors from the plane and use them to construct a line (that does not cross the origin), (2) do the projection and show that the projected result is a subset of the set of all solutions, and (3) show that the projected result has an infinite amount of points (projection must not be mapped to one point, for example).

The subproblem (1) would have been straightforward if some linear algebra results had been available. But as you already know, that whole 1 page and approximately 10 lines of space (on pages 16-17 in 3rd edition) Rudin uses for introducing Euclidean spaces did not cover much linear algebra. So, for my approach I needed a short lemma to cover my linear algebra needs:

Lemma 1: Suppose $k\geq3$ and $\mathbf{a}\in\mathbb{R}^k:\mathbf{a}\neq0$. Then there exist $\mathbf{b},\mathbf{c}\in\mathbb{R}^k$ such that $\mathbf{b}\neq0$, $\mathbf{c}\neq0$, $\mathbf{b}\cdot\mathbf{a}=0$, $\mathbf{c}\cdot\mathbf{a}=0$ and $\forall \alpha,\beta \in \mathbb{R}: \alpha \neq 0 \lor \beta \neq 0 \Rightarrow \alpha \mathbf{b} + \beta \mathbf{c} \neq 0$.

Proof: Let $i, j, n \in \{1,\ldots,k\}$ be distinct indices such that $a_i \neq 0$.

Construct $\mathbf{b}$ so that $b_j=a_i$, $b_i=-a_j$ and $\forall m \in \{1,\ldots,k\}\setminus\{i,j\}:b_m=0$. Now $\mathbf{a}\cdot\mathbf{b}=\sum_{m=0}^k a_m b_m = -a_i a_j + a_j a_i = 0$.

Construct $\mathbf{c}$ so that $c_n=a_i$, $c_i=-a_n$ and $\forall m \in \{1,\ldots,k\}\setminus\{i,n\}:c_m=0$. Now $\mathbf{a}\cdot\mathbf{c}=\sum_{m=0}^k a_m c_m = -a_i a_n + a_n a_i = 0$.

Let $\alpha,\beta \in \mathbb{R}^k$ such that $\alpha \neq 0$ or $\beta \neq 0$ and let $\mathbf{d} = \alpha \mathbf{b} + \beta \mathbf{c}$. Now $d_j=\alpha a_i$ and $d_n=\beta a_i$. Thus $d_i \neq 0$ or $d_n \neq 0$, which implies $\mathbf{d} \neq 0$.

This lemma just makes it possible to pick two linearly independent points from the plane. And the lemma is actually the only place where we need to touch individual elements of points/vectors, so in that way it makes rest of the proof much cleaner.

And finally the actual proof you were interested in:

Problem: Let $2r \gt d$. Show that $\mathbf{W} = \{\mathbf{w}\in\mathbb{R}^k | \mathbf{w}\cdot(\mathbf{y}-\mathbf{x})=0 \land |\mathbf{w}|^2=r^2-\frac{1}{4}d^2\}$ is infinite.

Proof: By lemma 1, there are $\mathbf{u},\mathbf{v}\in\mathbb{R}^k$ such that $\mathbf{u}\neq0$, $\mathbf{v}\neq0$, $\mathbf{u}\cdot(\mathbf{y}-\mathbf{x})=0$, $\mathbf{v}\cdot(\mathbf{y}-\mathbf{x})=0$ and $\forall \alpha,\beta \in \mathbb{R}: \alpha \neq 0 \lor \beta \neq 0 \Rightarrow \alpha \mathbf{u} + \beta \mathbf{v} \neq 0$. Let $r_w = \sqrt{r^2-\frac{1}{4}d^2}$ and $W_{\mathbf{u},\mathbf{v}}=\{\frac{r_w}{|\mathbf{u}-\lambda\mathbf{v} |}(\mathbf{u}-\lambda\mathbf{v}) | \lambda\in\mathbb{R}\}$.

Let $\mathbf{w} \in \mathbf{W}_{\mathbf{u},\mathbf{v}}$. Now $\exists \lambda \in \mathbb{R}$ such that $\mathbf{w} = \frac{r_w}{|\mathbf{u}-\lambda\mathbf{v} |}(\mathbf{u}-\lambda\mathbf{v})$. Thus $|\mathbf{w}| = r_w$ and $\mathbf{w}\cdot(\mathbf{y}-\mathbf{x}) = \frac{r_w}{|\mathbf{u}-\lambda\mathbf{v} |}(\mathbf{u}\cdot(\mathbf{y}-\mathbf{x})-\lambda\mathbf{v}\cdot(\mathbf{y}-\mathbf{x})) = 0$. Therefore $\mathbf{w} \in \mathbf{W}$ and $\mathbf{W}_{\mathbf{u},\mathbf{v}} \subset \mathbf{W}$.

Let $\alpha,\beta\in\mathbb{R}$ such that $\alpha \neq \beta$. Now either $|\mathbf{u}-\beta\mathbf{v}| - |\mathbf{u}-\alpha\mathbf{v}| \neq 0$ or $|\mathbf{u}-\beta\mathbf{v}|\alpha - |\mathbf{u}-\alpha\mathbf{v}|\beta \neq 0$ and therefore $$ \begin{align} & \frac{r_w}{|\mathbf{u}-\alpha\mathbf{v} |}(\mathbf{u}-\alpha\mathbf{v}) - \frac{r_w}{|\mathbf{u}-\beta\mathbf{v}|}(\mathbf{u}-\beta\mathbf{v}) \\ = \ & \frac{r_w}{|\mathbf{u}-\alpha\mathbf{v}| \ |\mathbf{u}-\beta\mathbf{v} |}\bigl[ (|\mathbf{u}-\beta\mathbf{v}| - |\mathbf{u}-\alpha\mathbf{v}|)\mathbf{u} + (|\mathbf{u}-\beta\mathbf{v}|\alpha - |\mathbf{u}-\alpha\mathbf{v}|\beta)\mathbf{v} \bigr] \\ \neq \ & 0. \end{align} $$ But this means there is 1-1 mapping between $\mathbb{R}$ and $\mathbf{W}_{\mathbf{u},\mathbf{v}}$ and thus $\mathbf{W}_{\mathbf{u},\mathbf{v}}$ is infinite. Since $\mathbf{W}_{\mathbf{u},\mathbf{v}} \subset \mathbf{W}$, also $\mathbf{W}$ is infinite.

1
On

I don't really follow the details of your proof. The gist of the matter is that the hyperplane $P$ (perpendicular to $\mathbf{x-y}$ and) bisecting $x$ and $y$ and the sphere $S$ centred at $(\mathbf x+\mathbf y)/2$ of radius $\sqrt{r^2-d^2/4}$ have an intersection that contains infinitely many point. It seems to me that you are trying to show that $P\cap S$ contains a circle. If this is the case, your idea is correct but it is possible to give a more succinct proof using only high school vector algebra.

Since $\|\mathbf{x-y}\|=d>0$, the vector $\mathbf{x-y}$ has at least one nonzero coordinate. Without loss of generality, we may assume that $x_1\ne y_1$. Hence $\mathbf{\widetilde{d}}=(x_1-y_1,\,x_2-y_2,\,x_3-y_3)\in\mathbb R^3$ is nonzero and we may pick a vector $\mathbf{\widetilde{r}}\in\mathbb R^3$ that is not parallel to it, such as $\mathbf{\widetilde{r}}=(x_1-y_1,\,x_2-y_2+1,\,0)$. Let $\mathbf{\widetilde{p}}=\mathbf{\widetilde{r}}\times\mathbf{\widetilde{d}}$ and $\mathbf{\widetilde{q}}=\mathbf{\widetilde{d}}\times\mathbf{\widetilde{p}}$. Then $\mathbf{\widetilde{d}},\mathbf{\widetilde{p}}$ and $\mathbf{\widetilde{q}}$ are three nonzero and mutually perpendicular vectors. Normalise $\mathbf{\widetilde{p}}$ and $\mathbf{\widetilde{q}}$ (i.e. divide them by their own norms) so that they become unit vectors. Define $\mathbf{p}=(\widetilde{p}_1,\widetilde{p}_2,\widetilde{p}_3,0,\ldots,0)\in\mathbb R^k$ (i.e. $\mathbf p$ is obtained by padding $\mathbf{\widetilde{p}}$ with $k-3$ zeroes) and define $\mathbf q$ analogously. Then $\mathbf{x-y},\mathbf p,\mathbf q$ are mutually perpendicular. Let $$ f(t)=\frac{\mathbf{x+y}}2+\sqrt{r^2-\frac{d^2}4}\,\big(\cos(t)\,\mathbf p+\sin(t)\,\mathbf q\big),\ t\in[0,2\pi). $$ Since $\mathbf p$ and $\mathbf q$ are not parallel to each other, $f$ is injective. In turn, the circle $C=f([0,2\pi))$ has infinitely many points. Clearly, every point $\mathbf z$ on $C$ satisfies the equations $\|\mathbf{z-x}\|=\|\mathbf{z-y}\|=r$. Now we are done.