I have some problems on understanding the proof of Cauchy-Schwartz inequality from my textbook:
Given $\textbf{x,y} \in \mathbb{R} \Rightarrow \vert \textbf{x}^T \textbf{y} \vert \le \Vert \textbf{x} \Vert_2 \cdot \Vert \textbf{y} \Vert_2$
the proof:
given $\lambda \in \mathbb{R}$ we can observe that : $$\begin{align} 0 \le \Vert \textbf{x} + \lambda \textbf{y} \Vert_2^2 & = \sum_{i=1}^n (x_i + \lambda y_i)^2 \\ & = \sum_{i=1}^n (x_i^2 + 2 \lambda x_i y_i + \lambda^2 y_i^2) \\ & = \sum_{i=1}^n x_i^2 + 2\lambda \sum_{i=1}^n x_iy_i + \lambda^2 \sum_{i=1}^n y_i^2 \\ & = \Vert \textbf{x} \Vert^2_2 + 2 \lambda \textbf{x}^T \textbf{y} + \vert \lambda \vert^2 \Vert \textbf{y} \Vert^2_2 \end{align}$$ if $\textbf{x}^T\textbf{y} = 0$ then the thesis is surely true.
Instead, if $\textbf{x}^T\textbf{y} \ne 0$, then we can consider: $$\lambda = - \frac{\Vert \textbf{x} \Vert^2_2}{\textbf{x}^T\textbf{y}}$$ therefore we have: $$\begin{align} 0 \le \Vert \textbf{x} \Vert^2_2 - 2\Vert \textbf{x} \Vert^2_2 + \frac{\Vert \textbf{x} \Vert^4_2}{\vert \textbf{x}^T \textbf{y} \vert^2} \Vert \textbf{y} \Vert^2_2 & = -\Vert \textbf{x} \Vert^2_2 \frac{\Vert \textbf{x} \Vert^4_2}{\vert \textbf{x}^T \textbf{y} \vert^2} \Vert \textbf{y} \Vert^2_2 \\ & = \Vert \textbf{x} \Vert^2_2 \left ( -1 + \frac{\Vert \textbf{x}\Vert^2_2 \Vert \textbf{y}\Vert^2_2 }{ \vert \textbf{x}^T \textbf{y} \vert^2 }\right ) \end{align}$$ we can deduce that : $$\Vert \textbf{x} \Vert^2_2 \Vert \textbf{y} \Vert^2_2 - \vert \textbf{x}^T \textbf{y} \vert^2 \ge 0 $$ and then the thesis follows easily.
There are some questions I want to post here:
1) why immediately is it observed that $0 \le \Vert \textbf{x} + \lambda \textbf{y} \Vert_2^2$, from where we can deduce that observation?
2) when it says << if $\textbf{x}^T\textbf{y} = 0$ then the thesis is surely true. >> for "thesis", does it mean the expression of above proposition?:
$$\vert \textbf{x}^T \textbf{y} \vert \le \Vert \textbf{x} \Vert_2 \Vert \textbf{y} \Vert_2$$
but if I substitute in the last passage:
$$\begin{align}& = \Vert \textbf{x} \Vert^2_2 + 2 \lambda \textbf{x}^T \textbf{y} + \vert \lambda \vert^2 \Vert \textbf{y} \Vert^2_2 \\ & = \Vert \textbf{x} \Vert^2_2 + 2 \lambda 0 + \vert \lambda \vert^2 \Vert \textbf{y} \Vert^2_2 \\ & = \Vert \textbf{x} \Vert^2_2 + \vert \lambda \vert^2 \Vert \textbf{y} \Vert^2_2 \end{align}$$
I do not obtain the $\Vert \textbf{x} \Vert_2 \Vert \textbf{x} \Vert_2$ of the thesis of the proposition.
3) why considering the $\lambda$ in that way?
Instead, if $\textbf{x}^T\textbf{y} \ne 0$, then we can consider $$\lambda = - \frac{\Vert \textbf{x} \Vert^2_2}{\textbf{x}^T\textbf{y}}$$
Please, can you help me to understand better? Many thanks!
(1) This is from the definition of a norm; namely, they are nondegenerate. (2) The RHS is non-negative (again by nondegeneracy) and so if the LHS is 0, then the inequality must be satisfied. (3) This choice of $\lambda$ makes the arithmetic come out like you want. He proves a general inequality and then picks a specific $\lambda$ so that the more general inequality will reduce to the desired inequality.