Intuition behind how the Cauchy-Schwarz inequality's proof was obtained

470 Views Asked by At

I'm studying multivariable calculus. Usually, when I study, I go through a book until I find a theorem, and then try to prove it. I was unable to, so I read the proof, which is the following:

Let $x, y \in \mathbb{R}^m, \alpha \in \mathbb{R}$. Then $(x+\alpha y)\cdot(x+\alpha y) = \vert \vert x+\alpha y\vert\vert^2 \geq0$. Using the properties a the inner product we get:

$(x+\alpha y)\cdot(x+\alpha y) = x\cdot x+\alpha x\cdot y + \alpha y\cdot x + \alpha^2y\cdot y = \vert\vert x\vert\vert^2+2(x\cdot y)\alpha + \alpha^2\vert\vert y\vert\vert^2 \geq 0$.

That last inequality is true iff the discriminant of the polynomial with respect to $\alpha$ is less than or equal to 0. Therefore $\vert x\cdot y\vert - \vert \vert x\vert\vert^2\vert\vert y\vert\vert^2 \leq 0$, from which comes the Cauchy-Schwarz inequality. Q.E.D

I can follow every step of the proof. I also get the intuition of why the inequality should be true. However, the proof seems "empty" to me. I don't understand what someone who wanted to prove this would do to find it. What's the intuition behind using $x+\alpha y$?

The reason I ask this is because, after I read the proof, the way used to prove it was so beyond everything that I tried, that I am almost sure that I'd never be able to prove this on my own. How to deal with these kind of situations?

2

There are 2 best solutions below

3
On BEST ANSWER

I don't know about anybody else, but I share your dissatisfaction with the standard slick proof, and I personally find it helpful to think instead of expressing $x$ as a sum of a multiple of $y$ and a vector orthogonal to $y$. This kind of resolution of a vector into two mutually orthogonal components is a common and natural operation.

If $\lambda$ is real, then $x - \lambda y$ is orthogonal to $y$ if and only if (in your notation) $(x - \lambda y) \cdot y = 0$, i.e., $$ \lambda \|y\|^2 = x \cdot y. $$

For any value of $\lambda$ satisfying that condition ($\lambda$ may be chosen arbitrarily if $y = 0$, and there is a unique solution for $\lambda$ if $y \ne 0$), write $u = x - \lambda y$ and $v = \lambda y$, so that $x = u + v$ and $u \cdot v = 0$. Then: \begin{align*} \|x\|^2 & = (u + v) \cdot (u + v) \\ & = u \cdot u + 2u \cdot v + v \cdot v \\ & = \|u\|^2 + \|v\|^2 \\ & \geqslant \|v\|^2. \end{align*} Therefore, using the definitions of $v$ and $\lambda$: $$ \|x\|^2\|y\|^2 \geqslant \|v\|^2\|y\|^2 = \lambda^2\|y\|^4 = (x \cdot y)^2 = |x \cdot y|^2, $$ and the result follows. So the selection of the value $-\lambda$ for $\alpha$ does make some intuitive sense (to me, at least).

You could arrive at this value of $\alpha$ less intuitively by "completing the square" in the expression you derived for $\|x + \alpha y\|^2$, thus, multiplying by $\|y\|^2$, to avoid a possible division by zero: \begin{align*} \|x + \alpha y\|^2\|y\|^2 & = \|x\|^2\|y\|^2 + 2(x \cdot y)\alpha\|y\|^2 + \alpha^2\|y\|^4 \\ & = (\alpha\|y\|^2 + x \cdot y)^2 + \|x\|^2\|y\|^2 - (x \cdot y)^2 \\ & = \|x\|^2\|y\|^2 - (x \cdot y)^2, \end{align*} if $$\alpha\|y\|^2 + x \cdot y = 0. $$ So the proof you quoted can be seen as the proof by resolution into orthogonal components in heavy disguise.

2
On

Proving theorems about general vector spaces, or general inner product spaces, can begin by considering a familiar $2$- or $3$-dimensional space. But then you need to abstract the intuition so it's pure algebra, no diagrams required. So your question comes down to what sort of preamble may have helped here.

If you think about vectors in a space you can visualize, all the theorem says is that the angle $\theta$ between two vectors satisfies $-1\le\cos\theta\le 1$, which by the cosine rule is equivalent to the triangle inequality. Since the cosine rule can be stated in terms of dot products, it makes sense to see what you learn from one more equivalent result, $\Vert x-y\Vert^2\ge 0$.

But $\Vert x-\alpha y\Vert^2\ge 0$ is a natural generalisation, and connects the issue to extremising quadratics, with the extremum giving us the most inequality we can get. And we don't need to think about a specific vector space to use $\Vert v\Vert^2\ge 0$, so it's a general starting point.