I'm self-learning Real Analysis through Rudin's Principles of Mathematical Analysis classical book. I've been stuck on this one question for a few weeks (not constantly of course). I had a few failed proof attempts and I had to learn some basic Linear Algebra since I don't have any background on the subject. It's crazy how much you can learn from one question from this legendary book, especially when you don't have time limitations.
So, I'm not sure my proof is valid. I would like to hear the opinion of someone more experienced. Also, could I've done some things more elegantly? Is it normal that this question was so hard for me? And, do you have some remarks on my style?
The Question:
Suppose $k \geq 3$, $\vec{\textbf{x}}, \vec{\textbf{y}} \in \mathbb{R}^k$, $\left\lVert \vec{\textbf{x}} - \vec{\textbf{y}} \right\rVert = d > 0$, and $r > 0$. Prove: If $2r > d$, there are infintely many $\vec{\textbf{z}} \in \mathbb{R}^k$ such that $\left\lVert \vec{\textbf{z}} - \vec{\textbf{x}} \right\rVert = \left\lVert \vec{\textbf{z}} - \vec{\textbf{y}} \right\rVert = r$.
My Proof:
We show there are infinitely many $\vec{\textbf{z}^*} \in \mathbb{R}^k$, such that $\left\lVert \vec{\textbf{z}^*} \right\rVert = r^2 - \frac{d^2}{4}$, and $\vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}})$. For every $a \in \mathbb{R}$, $0 < a < \frac{1}{2}$, we can construct a $\vec{\textbf{z}^*}$, in the following way:
Let $i, j, t, p \in \mathbb{N}$, $1 \leq i, j, t, p \leq k$. $\vec{\textbf{z}^*} = (z_1, z_2, z_3, ..., z_k)$, where $z_j = b\sqrt{l}$, $z_t = a\sqrt{l}$ and $z_p = c\sqrt{l}$, such that $j$ is the index number of the $j$'th component of $\vec{\textbf{x}} - \vec{\textbf{y}}$, so that $|x_j-y_j| \geq |x_i - y_i|$, for every possible $i$, and $t \neq j, p$, $p \neq j$. For every $z_i$, with $i \neq j, t, p$, we put $z_i = o$. $c = -\sqrt{1-a^2-b^2}$ and $l = r^2 -\frac{d^2}{4}$. Let $\alpha = (x_j - y_j)^2 + (x_p - y_p)^2$, $\beta = 2a(x_j - y_j)(x_t - y_t)$ and $\gamma = a^2[(x_t - y_t)^2+(x_p - y_p)^2] - (x_p - y_p)^2$. We then define $$b = \frac{\sqrt{\beta^2 - 4\alpha\gamma} -\beta}{2\alpha}$$ We show $b$ always exists. $2\alpha = 2[(x_j - y_j)^2+(x_p - y_p)^2] > 0$, since $(x_j - y_j)^2$ must be positive ($d > 0$). Therefore the donuminator is not zero. \begin{equation} \begin{aligned} \beta^2 - 4\alpha\gamma = 4a^2(x_j - y_j)^2(x_t - y_t)^2 - 4\left[(x_j - y_j)^2+(x_p - y_p)^2\right]\left[a^2\left[(x_j - y_j)^2+(x_p - y_p)^2\right]-(x_p - y_p)^2\right]\\ = 4\left[a^2(x_j - y_j)^2(x_t - y_t)^2-\left[a^2(x_j - y_j)^2(x_t - y_t)^2+a^2(x_j - y_j)^2(x_p - y_p)^2+a^2(x_t - y_t)^2(x_p - y_p)^2+a^2(x_p - y_p)^4-(x_j - y_j)^2(x_p - y_p)^2-(x_p - y_p)^4\right]\right]\\ =4\left[(x_p - y_p)^4-a^2(x_p - y_p)^4+(x_p - y_p)^2(x_j - y_j)^2-a^2(x_p - y_p)^2(x_j - y_j)^2-a^2(x_p - y_p)^2(x_t - y_t)^2\right]\\ =4(x_p - y_p)^2\left[(x_p - y_p)^2(1-a^2)+(x_j - y_j)^2(1-a^2)-a^2(x_t - y_t)^2\right]\\ =4(x_p - y_p)^2\left[(1-a^2)\left[(x_p - y_p)^2+(x_j - y_j)^2\right]-a^2(x_t - y_t)^2\right] \geq 0 \end{aligned} \end{equation} , since $(x_j - y_j)^2 > (x_t - y_t)^2$ and $(1-a^2)^2 > a^2$. Thus $b$ is indeed defined. Then, we get $\left\| \vec{\textbf{z}^*} \right\|^2 = a^2l + b^2l + c^2l = l(a^2+b^2+1-a^2-b^2) = l = r^2 - \frac{d^2}{4}$. By the quadratic formula, we have $\alpha b^2 + \beta b + \gamma = 0$. Hence: \begin{equation} \begin{aligned} b^2(x_j-y_j)^2 + b^2(x_p-y_p)^2 + 2ab(x_j-y_j)(x_t-y_t) + a^2(x_t-y_t)^2 +a^2(x_p-y_p)^2-(x_p-y_p)^2 = 0\\ a^2(x_t-y_t)^2+2ab(x_t-y_t)(x_j-y_j)+b^2(x_j-y_j)^2-(x_p-y_p)^2(1-a^2-b^2) = 0 \end{aligned} \end{equation} Therefore \begin{equation} \begin{aligned} a^2(x_t-y_t)^2+2ab(x_t-y_t)(x_j-y_j)+b^2(x_j-y_j)^2 = (x_p-y_p)^2(1-a^2-b^2)\\ \left[a(x_t-y_t)+b(x_j-y_j)\right]^2 = (x_p-y_p)^2(1-a^2-b^2)\\ \end{aligned} \end{equation} then taking the square roots from both sides we get $a(x_t-y_t)+b(x_j-y_j) = (x_p-y_p)^2(1-a^2-b^2)$, hence $\sqrt{l}\left[a(x_t-y_t)+b(x_j-y_j)+c(x_p-y_p)\right] = 0$, thus $\vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}})$.
Since there are infinitely many $0<a<\frac{1}{2}$, there are infinitely many $\vec{\textbf{z}^*}$.
Now, for every $\vec{\textbf{z}^*}$, define $\vec{\textbf{z}} = \vec{\textbf{x}} - \frac{1}{2}(\vec{\textbf{x}}-\vec{\textbf{y}}) + \vec{\textbf{z}^*}$. Then we get $$\left\| \vec{\textbf{z}} - \vec{\textbf{x}} \right\| = \sqrt{\left(-\frac{1}{2}\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right) + \vec{\textbf{z}^*}\right)^2} = \sqrt{\frac{\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right)^2}{4} -\vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}}) + \left\lVert \vec{\textbf{z}^*} \right\rVert^2} = \sqrt{\frac{d^2}{4} - 0 +r^2 - \frac{d^2}{4}} = r$$ It's clear that $\vec{\textbf{z}} = \vec{\textbf{y}} - \frac{1}{2}(\vec{\textbf{y}}-\vec{\textbf{x}}) + \vec{\textbf{z}^*}$, hence we get $$\left\| \vec{\textbf{z}} - \vec{\textbf{y}} \right\| = \sqrt{\left(\frac{1}{2}\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right) + \vec{\textbf{z}^*}\right)^2} = \sqrt{\frac{\left(\vec{\textbf{x}}-\vec{\textbf{y}}\right)^2}{4} + \vec{\textbf{z}^*} \cdot (\vec{\textbf{x}} - \vec{\textbf{y}}) + \left\lVert \vec{\textbf{z}^*} \right\rVert^2} = \sqrt{\frac{d^2}{4} + 0 +r^2 - \frac{d^2}{4}} = r$$
Thus, there are infintely many $\vec{\textbf{z}} \in \mathbb{R}^k$ such that $\left\lVert \vec{\textbf{z}} - \vec{\textbf{x}} \right\rVert = \left\lVert \vec{\textbf{z}} - \vec{\textbf{y}} \right\rVert = r$.
I'm also self studying Rudin at the moment, so leave it up to you to decide, if I'm experienced enough to answer you.
I think I understand the idea of your proof, but still I find it a bit hard to follow. For example, to me it is somewhat confusing that your construction of $\mathbf{z}^*$ uses $b$ and $l$ that are defined later in your proof. My reading of your proof stops when I realize that I do not know definition of these symbols. Then I scan first backwards (in case I missed some definitions) and only after that I try to find them elsewhere. So maybe defining all necessary symbols required in construction of $\mathbf{z}^*$ before actually constructing $\mathbf{z}^*$ would make the proof more readable. (And working that way may also reduce risk of doing some circular reasoning and / or make it easier to detect, since definitions and logic of the proof flows to one direction only.)
Also, you write
At least how i read this, you kind of fix $i$,$j$,$t$ and $p$ to be some integers from the given interval. And then later in some following sentence you start placing further restrictions on those integers and $i$ is not actually having any fixed value but it is rather "any other" integer than $j$,$t$ or $p$. So there is some confusion. (Of course, some of the confusion might be due to the fact that English is not my native language.)
Expressions of form $(x_k-y_k)$ take a considerable amount of space in some parts of your proof. Maybe using some more compact notation like $w_k=(x_k-y_k)$ would make your proof look simpler and shorter. This way possible errors are easier / faster to spot and you also will have less multi-line expressions (which helps readability also).
You can also consider elegance from a broader perspective. For example, if you consider parts (b) and (c) of the same problem 1.16, how easy it is to reuse your results to do the rest of the problem or are you forced to go through some lengthy derivations once again? For example, when I did this problem, I first proved (without using $2r>d$ given in part (a)) that $\mathbf{z}$ is a solution of the given equation if and only if $\mathbf{z}$ can be written in form $\mathbf{z}=\frac{1}{2}(\mathbf{x}+\mathbf{y})+\mathbf{w}$, where $\mathbf{w}\in\mathbb{R}^k$ is such that $\mathbf{w}\cdot(\mathbf{y}-\mathbf{x})=0$ and $|\mathbf{w}|^2=r^2-\frac{1}{4}d^2$. That result can be reused in parts (b) and (c) where an exact amount solutions is needed.
In your construction, you set $c=-\sqrt{(1-a^2-b^2)}$. How do you know that $c$ is a real number? That seems to be case though and (at least partial) answer to that can be found much later in your proof. But (at least to me) it is not directly evident from the way you define b.
But in general, I think your idea / thinking behind the proof seems to be ok (if we put aside those readability issues I described above). Although, I have to admit that I read through some parts of your derivations bit too fast, so there might be some errors I did not spot. But at least those that I checked seemed to be ok.
That depends on a definition of normal. At least for me, there seems to be some easier and then some much harder problems in Rudin's book. Some problems have taken days to solve. And in chapter 2 there is a problem that I have not yet solved even I have invested a reasonable amount of time to it. (I am now working my way through chapter 3, but I think I will return to that problem later.)
Have you checked George Bergman's supplementary notes for Rudin's book? In addition to some good extra problems, there are difficulty codes for problems too. While assessing a difficulty level for a problem is somewhat subjective thing (depends strongly on background of the person evaluating the difficulty level), I think those codes in Bergman's notes have pretty much correlated (with some exceptions) with my personal experience of relative difficulty levels of problems in Rudin's problem sets.
Edit: Added answer to a question presented in comments.
Ok, I'll try to explain my solution and thinking in more detail.
The basic idea in my proof was based on geometric intuition: The set of all solutions seems to be a (hyper)circle on a (hyper)plane. I first thought of constructing some part of that circle directly (as you did) but pretty soon that turned out to be a bit messy. So, I looked for another way to solve this. The next idea was to take a line from the plane and then project it to the circle. That would give me an infinite subset of the set of all solutions.
That approach had three subproblems: (1) find two points /vectors from the plane and use them to construct a line (that does not cross the origin), (2) do the projection and show that the projected result is a subset of the set of all solutions, and (3) show that the projected result has an infinite amount of points (projection must not be mapped to one point, for example).
The subproblem (1) would have been straightforward if some linear algebra results had been available. But as you already know, that whole 1 page and approximately 10 lines of space (on pages 16-17 in 3rd edition) Rudin uses for introducing Euclidean spaces did not cover much linear algebra. So, for my approach I needed a short lemma to cover my linear algebra needs:
This lemma just makes it possible to pick two linearly independent points from the plane. And the lemma is actually the only place where we need to touch individual elements of points/vectors, so in that way it makes rest of the proof much cleaner.
And finally the actual proof you were interested in: