Source: Poole, D. Linear Algebra: A Modern Introduction (2014 4 edn). Section 7.1. Exercise 44.
- Let $\mathbf{u}$ and $\mathbf{v}$ be vectors in an inner product space $V$. Prove the Cauchy-Schwarz Inequality for $\mathbf{u\neq0}$ as follows: (a) Let $t$ be a real scalar. Then $\langle t\mathbf{u} + \mathbf{v}, t\mathbf{u} + \mathbf{v}\rangle \ge 0$ for all values of $t$. Expand this inequality to obtain a quadratic inequality of the form $at^2 + bt + c \ge 0$ [...]
How can you divine this trick of inventing this quadratic polynomial? It feels like a flash of genius, but I'm not clairvoyant. I already know, and ask not, how to execute the algebra.
Timothy Gowers feels likewise that
most textbooks and all analysis courses I have attended favour the approach where you write down
$ \lVert x-cy \rVert^2$, which is real and non-negative, and then choosing a `clever' value of $c$ from which to deduce the Cauchy-Schwarz inequality. Of course, $c$ can be justified as the value that minimizes the quadratic expression that results from expanding $\lVert x-cy \rVert^2$, but even so the idea of writing down $ \lVert x-cy \rVert^2$ in the first place is not an obvious one [mine].
No explanation is usually given of where the quadratic form comes from. This page is intended for those who happen not to have observed, or been shown, that more or less the same argument can be made to seem much more natural. Indeed, this is another example of a proof that a well-programmed computer could reasonably be expected to discover.
For ${\Bbb C}^n$ one may write $$ \|x\|^2\|y\|^2-|\langle x,y\rangle|^2\ge 0\quad\Leftrightarrow\quad\begin{bmatrix}x^*x & x^*y\\y^*x & y^*y\end{bmatrix}=\begin{bmatrix}x \\y\end{bmatrix}^*\begin{bmatrix}x \\y\end{bmatrix} \text{ pos.semidef.} $$ and the latter is obvious for all $x$, $y$. How would one test positive semidefiniteness of the Gramian matrix $$ \begin{bmatrix}\langle x,x\rangle & \langle x,y\rangle\\\langle y,x\rangle & \langle y,y\rangle\end{bmatrix} $$ in a general Hilbert space? To pre- and postmultiply by the vector $(1,t)$ sound like a natural way to do it.