Constructive proof of the Cauchy Schwarz inequality

950 Views Asked by At

The famous CS inequality states

$$ \left| \left< x , y \right>\right| \le \left\| x \right\| \cdot \left\| y \right\| $$

for $x,y$ in an inner product space $X$ over $\mathbb{K}$. Every proof I found involves some kind of case distinction; namely one may use w.l.o.g $\Vert x \Vert, \Vert y \Vert > 0$ since the inequality is trivial otherwise. However, I was looking for a constructive proof (i.e. without using the law of excluded middle) for the inequality.

I will add some remarks to the question.

1) Law of excluded middle: This axiom states that $A \vee \neg A$ is true. This is not considered an axiom in constructive mathematics. One might interpret this in the following way: Indirect proofs are not allowed. However, I find this not completely accurate. Precisely the implication $A \rightarrow \neg \neg A$ is true in constructive mathematics; the implication $ \neg \neg A \rightarrow A$ not in gerneral.

2) The definition of an inner product is the same as in "classical" mathematics.

3) The norm $\Vert \cdot \Vert$ is given by $\Vert x \Vert = \sqrt{\langle x, x\rangle }$.

5

There are 5 best solutions below

11
On BEST ANSWER

Disclaimer:

The initial answer was written in a short time after the posted question. The initial question did not - for my glasses - put enough accent on the fact that we are working in Constructive analysis. (The tag constructive-mathematics did not even capture my attention on its last place where it stayed.) So the question was explicitly:

The famous CS inequality states $$ \vert \langle x , y \rangle \vert \le \Vert x \Vert \Vert y \Vert$$ for some $x,y$ in a Hilbert space. Every proof I found involves some kind of case distinction; namely one may use w.l.o.g $\Vert x \Vert, \Vert y \Vert > 0$ since the inequality is trivial otherwise. However, I was looking for a constructive proof (i.e. without using the law of excluded middle) for the inequality.

For me, a "constructive proof" was - in the second i answered the question - "constructive" (as a proof) e.g. in the sense of not avoiding a case by case examination. (One can still construct $\Bbb R$ and $\Bbb C$ in the "common sense" with the full system of axioms, and now ask for a "simple", constructive proof of "CS". People working exclusively in constructive mathematics may exclude this interpretation by default, the OP was not really precise at this point. Downvoters can please think about the situation, especially because they live in a world where such distinctions matter. Then reread the question again and again.)

After a first version of this answer was given, we've got in the OP a slightly changed question, there is a pre-Hilbert space instead of a Hilbert space, a general ground field $\Bbb K$ whithout closer specification, and the "constructive proof" can / may now be also/rather understood as a valid proof inside constructive analysis. (I would not even have touched the question in that form, if this direction would have been made clear from the beginning, a lot of work. But OK.) Comments have moreover placed the question strictly in the domain of constructive analysis, so the downvotes are more than deserved from this point of view. Now i need to reanswer the question. I will explain the idea from the initial post, that still survives, and put the details in the framework of the constructive analysis. The steps will be to reduce the situation from a general pre-Hilbert space to a pre-Hilbert space with at most two generators, then to the study of quadratic forms over $\Bbb K$. An alternative "constructive" CS version is given.


$(0)$ Introduction.

For the reader who never saw this kind of intuitionistic logic, roughly "results are either true, or false, or not known". And we want to study only things that are "constructible" from a sparse set of axioms of the set theory, and propositions that are either true, or false from the same perspective. As an example of something "not known", consider / construct the sequence $a_n$ defined for $n\ge 6$ even by the rule $a_n=0$ if $n$ is the sum of two prime numbers, else $1$. This is well defined, because we can examine "finitely many cases", so we can answer questions like "Is $n$ even?", "Is this special given number $p$ prime?" or "Which are the primes $<n$?" or "Is $n$ the sum of two primes?". However, in order to know if the real number $r=\sum_{n\ge 6\text{ even}}a_n/2^n$ is zero we need a proof for the specific number theoretical conjecture. From this point of view, we have even problems to check if a (constructed) real number is zero or not. Similar examples can be given by taking an other conjecture in number theory to model the "unknown", e.g. $a_n$ is $0$ if there are at least $n$ non-trivial zeros of the Riemann $\zeta$ function, and the $n$.th zero is on the critical line, else $1$. Or one can use Gödel-like settings coming from (un)decidability theory.

But for two rational numbers (and for two numbers in finite algebraic extensions of $\Bbb Q$) we can decide if they are equal or not. Since in $\Bbb Q$ we have again a fraction representation, so a denominator and a bounded interval it lives in, so only finitely many cases to check... (If this kind of mathematics does not feel attractive so far, please do not read further. This taste of mathematics depends on somebodys own relation to chosen god. If you want to get closer to this entity, reading on will not be completely frustrating, it may be even a good test if somebody likes the fundamental, axiomatic, intuitionistic, and/or logic parts of mathematics. If still here, as a last word of warning, or as a thematic joke, note that after the constructivistic introduction of the set of real numbers $\Bbb R$, the proof that the equality is an equivalence relation on $\Bbb R$ is not a (one-line) joke. So please do not complain about the length of this answer.)


$(1)$

Let $X$ be the pre-Hilbert space given in the OP. The ground field is a field $K$ (instead of $\Bbb K$) containing $\Bbb Q$, and a positive cone $K_+$ so that the relations $>$ and $\ge$ can be defined. Fix $x,y\in X$. By considering the vectorial subspace $X'=Kx+Ky$ generated by $x,y$ we may an do assume that $X'$ has dimension $\le 2$ over $K$, and write $X$ instead. (The subspace also comes with a specified set of generators.)

Just as a comment, constructive mathematics makes an issue for the existence of a basis. Links that may serve as a starting point are mse question 944895, and (for $K=\Bbb Q$) mo question 326211. So in the given generality of the OP ($X$ pre-Hilbert over some $K$), just starting with $X=\Bbb Q^n$ or $X=\Bbb R^n$ or $X=\Bbb C^n$, as vector spaces with a given basis (the canonical basis), is a questionable step that has to be considered in detail. I will also not address here the question, if we can decide, if $X'$ has dimension $0$, $1$, or $2$.


$(2)$

In the sequel $a,b$ be scalars in $K$. We have the pre-Hilbert space $X$ over $K$ with generators $x,y\in X$. Then $\langle\ \cdot\ ,\ \cdot\ \rangle$ is a scalar product on $X$ iff it is "in a classical world". The OP does not make this precise, i will try to do it below. Giving $X$ means in particular to give $\|x\|,\|y\|\in K_{\Bbb R}:=K\cap \Bbb R$, with $\|x\|,\|y\|\ge 0$, and $\langle x,y\rangle\in K$. An element of $X$ is of the shape $ax+by$ with $$ \begin{aligned} \|ax+by\|^2 &=\langle ax+by, ax+by\rangle \\ & = \langle ax, ax\rangle +\langle ax, by\rangle +\langle by, ay\rangle +\langle by, by\rangle \\ & =|a|^2\;\|x\|^2 + 2\text{Real}(ab\langle x, y\rangle) + |b|^2\;\|y\|^2 \ , \end{aligned} $$ Letting $a,b$ run in $K$, we see that it is necessary and sufficient to insure that the worst case scenario inequality is satisfied: $$ 0\le |a|^2\;\|x\|^2 - 2|ab|\; |\langle x, y\rangle| + |b|^2\;\|y\|^2 \ ,\qquad a,b\in K\ . $$ Equivalently, $$ 0\le q(a,b):= a^2\;\|x\|^2 - 2ab\; |\langle x, y\rangle| + b^2\;\|y\|^2 \ ,\qquad a,b\in K_{\Bbb R, \ \ge 0}\ . $$ So giving a pre-Hilbert space $X$ over $K$, with generators $x,y\in X$, is equivalent to giving the data: $$ \|x\|\ge 0\ ,\ \|y\|\ge 0\ ,\ \langle x, y\rangle\in K\ ,\\ q:K_{\Bbb R}\times K_{\Bbb R}\to K_{\Bbb R}\ ,\\ q(a,b)= a^2\;\|x\|^2 - 2ab\; |\langle x, y\rangle| + b^2\;\|y\|^2\ , $$ so that $q\ge 0$. (Terminology: $q$ is non-negative semidefinite form.) We define the discriminant of the above quadratic form $q$ by $\Delta =4( |\langle x, y\rangle|^2 - \|x\|^2\; \|y\|^2)$. Note that if $\Delta>0$, after possibly passing to a quadratic extension we can construct two distinct "roots" of the associated equation (working to avoid norming problems in the one dimensional projective plane over $K_{\Bbb R}$), and this gives a point where the form $q$ is negative.

But we need the "other direction".


$(3)$ Note:

The study of the above quadratic form in $(a,b)$ cannot be "simply replaced" by the study of the form where either $a$ or $b$ is normed to $1$, since the reduction to the cases $\|x\|\ne 0$ and/or $\|y\|\ne 0$ may be problematic in the given framework. But we may get the hint to proceed in this way. (In a "normal" world, if the quadratic form $Ea^2 -2Fab +Gb^2$ with $E,G\ge 0$ is given, we would de-homogenize to get either $Ea^2-2Fa+G$ or $E-2Fb+Gb^2$, then study for a minimal point by inserting either $a=F/E$, and/or $b=F/G$. Without de-homogenization, we have to work "projectively", thus inserting $(a,b)=(F,E)$, and/or $(a,b)=(G,F)$.


$(4)$

The given problem is thus equivalent to the following one. We know for the above quadratic form $q$ that $q\ge 0$. (And want to show that the discriminant $\Delta \le 0$ or a related realtion.) To extract the essence, we start with a non-negative quadratic form of the shape $$ E\;a^2 -2F\; ab + G\; b^2\ . $$ In the case of the OP $E,F,G$ are respectively $\|x\|^2$, $|\langle x, y\rangle|$, $\|y\|^2$. Since the values of the quadratic form are always $\ge0$, this is also true for the specializations $(a,b)=(F,E)$, and respectively $(a,b)=(G,F)$, and we obtain: $$ \begin{aligned} 0&\le E\; F^2 - 2F\; EF + G\; E^2\ ,&\text{i.e. }\qquad 0&\le E(EG-F^2)\ ,\\ 0&\le E\; G^2 - 2F\; FG + G\; F^2\ ,&\text{i.e. }\qquad 0&\le G(EG-F^2)\ .\\ \end{aligned} $$ Adding the two inequalities we obtain: $$ \begin{aligned} 0&\le (E+G)(EG-F^2)\ ,\qquad\text{ i.e.}\\ \color{blue}{(*)\qquad 0} & \color{blue}{\le (\|x\|+\|y\|)(\|x\|\;\|y\|-|\langle x,y\rangle|^2)} \ . \end{aligned} $$


$(5)$

The above inequality $(*)$ is enough CS for me, and i will no longer bother about getting rid of the $(E+G)$-factor above. (This will be the first point where constructivists will search for an error, so my reaction is to kindly invite them to do this step. One may get rid of this factor in case $K$ is a finite extension of $\Bbb Q$, as comments have teached me. At any rate, this "contorsion" of the CS formula by the factor $(E+G)$ is rather innocent compared to similar contorsions needed to adapt other results from "normal" mathematics to their constructive analysis analogues.)

6
On

Breaking into cases doesn't necessarily rely on the law of excluded middle. It is true in a Hilbert space that either $x = 0$ or $x \neq 0$, we don't need to appeal to excluded middle for this fact. This constructively/intuitionistically reduces the problem to finding a proof for both the case $x=0$ and the case $x \neq 0$.

In other words, I agree that a case analysis uses excluded middle in some sense, but if you can constructively prove excluded middle for the cases you are considering, then you don't need to assume it in general.

4
On

Here is a proof from Pugh, Real Mathematical Analysis. I don't know if this proof can be modified for a complex vector space, but in case you're working over $\Bbb{R}$ it may be what you're looking for.

Let $x$ and $y$ be arbitrary and let $w = x + ty$ for a scalar $t$. Define $Q(t) = \langle w, w \rangle=\langle x + ty, x + ty \rangle$.

Then $Q(t)$ is nonnegative, being the value of an inner product. By the bilinearity of $\langle \cdot, \cdot \rangle$, we have: $$ Q(t) = \langle x,x\rangle + 2t\langle x,y \rangle + t^2\langle y,y\rangle = c + bt + at^2$$

Interpret $Q$ as a parabola; since it is nonnegative its discriminant $b^2 - 4ac$ must be nonpositive (else there would be two real roots, thus a segment of the parabola below the real axis). This gives: $$ 4\langle x,y\rangle^2-4\langle x, x\rangle \langle y,y\rangle \leq 0,$$ or rewriting: $$ \langle x,y \rangle^2 \leq \langle x,x\rangle \langle y,y \rangle. $$ Taking the square root of both sides gives the result.

2
On

I don't think that you can actually prove Cauchy-Schwartz (CS) for an arbitrary inner product space (constructively). In Constructive Analysis by Bishop & Bridges they define inner product spaces so that CS holds: enter image description here

However, for the special case of $\mathbb{R}^n$ it is possible to give a constructive proof, which can be found on page 83 of Constructive Analysis: enter image description here

0
On

This is a completion of dan_fulea's proof. Where he left off, there is a quadratic form $$ 0 \le q(x, y) = Ea^2 - 2 F ab + G b^2 , $$ which he evaluated to show that $$ 0 \le q(F, E) + q(G, F) = (E + G) (E G - F^2) . $$ Here, $E = \|x\|^2$, $F = \vert\langle x, y \rangle\vert$, and $G = \|y\|^2$, so this is equivalent to $$ 0 \le (\|x\|^2 + \|y\|^2) (\|x\|^2 \|y\|^2 - \vert\langle x, y \rangle\vert^2) . $$ This appears to be very close to the desired Cauchy-Schwartz inequality, as the only way for $\|x\|^2 + \|y\|^2$ to be zero is if $x = y = 0$, in which case the inequality is trivial. But it still remains to show this in a constructive way, without assuming $\|x\|^2 + \|y\|^2 = 0 \lor \|x\|^2 + \|y\|^2 > 0$.

Instead, evaluate $q(F + G, F + E)$ to find \begin{align*} 0 &\le q(F + G, F + E) \\ &= E (F + G)^2 - 2 F (F + G) (F + E) + G (F + E)^2 \\ &= E F^2 + 2 E F G + E G^2 - 2 F^3 - 2 F^2 (E + G) - 2 E F G + F^2 G + 2 E F G + E^2 G \\ &= E G^2 - 2 F^3 - F^2 (E + G) + 2 E F G + E^2 G \\ &= E G (E + 2 F + G) - F^2 (E + 2 F + G) \\ &= (E + 2 F + G) (E G - F^2) . \end{align*} Now we'd like to be able to divide both sides by $E + 2 F + G$ to get the Cauchy-Schwartz inequality, but this might be dividing by zero.

To work around this issue, use that the reals form a pseudo-order. In particular, the reals satisfy the comparison property, $$ a < b \implies a < c \lor c < b . $$ Now for any $\epsilon > 0$ we have $\epsilon < 2 \epsilon$, so $\epsilon < E + 2 F + G \lor E + 2 F + G < 2 \epsilon$. In the former case, we have bounded $E + 2 F + G$ away from zero, so after dividing $q(F + G, F + E)$ by it we get $E G - F^2 \ge 0 \ge -\epsilon^2$. In the latter case, we have \begin{align*} 2 F &\le 2 \epsilon \\ -F^2 &\ge -\epsilon^2 \\ E G - F^2 &\ge -\epsilon^2 . \end{align*} This shows that $E G - F^2 \ge -\epsilon^2$ for all $\epsilon > 0$. Since $\le$ is a closed subset of $\mathbb{R}^2$, we can apply $\lim_{\epsilon \to 0}$ to both sides of the inequality to get $E G - F^2 \ge 0$, i.e. $\|x\|^2 \|y\|^2 \ge \vert\langle x, y \rangle\vert^2$. Taking the square root of both sides then gives the Cauchy-Schwartz inequality.