Intuition for the Cauchy-Schwarz inequality

11.6k Views Asked by At

I'm not looking for a mathematical proof; I'm looking for a visual one. I'm having trouble understanding (in my mind's eye) why the dot product of two vectors V and W produces a scalar that is less than the length of V multiplied by the length of W.

In using the dot product, we are producing a parallel vector, correct? Could we not further say that we are simply applying vector W to vector V in order to produce a vector that is the original length of V multiplied by the length of W -- thus a vector parallel to V? For example, if we let vector W be a unit vector (with length of one), then the dot product of V and W would give us a scalar that, when applied to V, produces V again. Would this not be the same as the length of V multiplied by the length of W (given that the length of W is equal to one)?

For that reason, why wouldn't the dot product of V and W always be equal to the length of V multiplied by the length of W? Why would it be less (unless V = cW for any scalar c?)

6

There are 6 best solutions below

13
On BEST ANSWER

In the Cauchy–Schwarz (CS) inequality $|u\cdot v|\le \|u\|\|v\|$, let's assume $v$ is a normalised vector, i.e., $\|v\|=1$. Then the CS inequality becomes $|u\cdot v|\le \|u\|$. Now, it's a trivial matter to show that these two forms of the CS inequality are in fact equivalent, in the sense that if $|u\cdot v|\le \|u\|$ for all normalised vectors $v$, then the usual CS inequality holds for all vectors. So, let us restate the CS inequality as stating that $|u\cdot v|\le \|u\|$ for all normalised vectors $v$. Now, the physical/geometric interpretation of $u\cdot v$ in this case is that it is the component of the vector $u$ in the direction $v$ (since $v$ is assumed normalised, that's all it is, a direction), while $\|u\|$ is the magnitude of $u$. So the CS inequality is merely stating the intuitively obvious fact that the component of a vector $u$ in a single direction is bounded by the magnitude of $u$.

Incidentally, this line of thought carries on to produce a very short and elegant proof of the full CS inequality. But, as you are not looking for a proof, I'll leave that out as an exercise.

0
On

One can show that in Euclidean space, the angle $\theta$ between two vectors $v,w$ (in the sense of Euclidean geometry) satisfies

$$\cos(\theta)=\frac{v \cdot w}{\| v \| \| w \|}.$$

This is basically the law of cosines applied to an appropriate triangle. This equation only makes sense for every $v,w$ if the Cauchy-Schwarz inequality holds.

0
On

By definition, the "dot" product of two vectors, say $\vec A$ and $\vec B$ is

$$\vec A\cdot \vec B=|\vec A||\vec B|\cos \theta$$

where $\theta$ is the angle between $\vec A$ and $\vec B$. That is to say, that the inner product is the projection of one vector onto the other. Visually, the projection is like a "shadow" that one vector casts along the direction of the other.

2
On

Recall that $$a\cdot b=|a||b|\cos\theta$$ where $\theta$ is the angle between $a$ and $b$.

Using this fact it is easy to check that $\dfrac{a\cdot b}{|b|}$ is the component of $a$ in the direction of $b$. Of course the component of $a$ in the direction of $b$ must have absolute value less than or equal to the magnitude of $a$. This gives $\dfrac{|a\cdot b|}{|b|}\leq|a|$ and hence $|a\cdot b|\leq |a||b|$.

So really $a\cdot b=|a||b|\cos\theta$ gives not only a formal proof of the Cauchy-Schwarz inequality, but also a geometric way of thinking of the dot product that makes the Cauchy-Schwarz inequality clear.

0
On

Dot product as projection

(Adapted from wikimedia commons: File:Dot Product.svg using Inkscape 0.91 to convert to PNG.)

The image illustrates the scalar projection of $\mathbf{A}$ onto $\mathbf{B}$, sometimes denoted $A_B$. You already know that, if $||\mathbf{B}||=1$, $\mathbf{A} \cdot \mathbf{B} = A_B$, and so for nonspecial $\mathbf{B}$, $$ \mathbf{A} \cdot \mathbf{B} = \mathbf{A} \cdot \hat{\mathbf{B}}||\mathbf{B}|| = A_B ||\mathbf{B}|| = ||\mathbf{A}|| \, ||\mathbf{B}|| \cos \theta$$ where $\hat{\mathbf{B}}$ denotes the unit vector along $\mathbf{B}$.

But what does this tell us? That $\mathbf{A} \cdot \mathbf{B}$ is maximized when $\theta$ is 90 degrees. In that case, the parallelogram $\mathbf{0}, \mathbf{A} , \mathbf{A} + \mathbf{B}, \mathbf{A} + \mathbf{B} - \mathbf{A} ({}=\mathbf{B})$ is a rectangle. Using the area formula for parallelograms (base times height), the area is maximized when $\mathbf{A}$ is all height. When $\theta$ is not a right angle, the area is less, decreasing to zero as $\mathbf{A}$ and $\mathbf{B}$ become (anti-)parallel.

1
On

@Mehrdad

I had the same question as you have expressed:

I feel like the hard part is understanding why dot product has anything to do with projection in the first place. Why does the sum of a componentwise product tell you something about the vectors' projection?

After some thinking, I come up with the following reasoning, not sure if it make sense to you.

enter image description here

  1. In Section 6.5-1 of Lathi’s Linear Systems and Signals, 2nd, projection of $\mathbf{x}$ along $\mathbf{y}$ can be interpreted as a way to minimize the "error" $\mathbf{e}$, when $\mathbf{x}$ is expressed as $\mathbf{x}=c\mathbf{y}+\mathbf{e}$, where $c\mathbf{y}$ is the component of $\mathbf{a}$ in the direction of $\mathbf{b}$, and $\mathbf{e}$ is the "error" vector, which has the minimum length when it's perpendicular to $\mathbf{b}$.

  2. Btw: the "error" vector gives hints on how much $\mathbf{x}$ is differ from $c\mathbf{y}$. In this sense, projection is a way to find similarities of two vectors, and this may explain why correlation is calculated exactly in the same way as dot product.

  3. Now the question can be rephrased as: why product has anything to do with correlation/similarity? Let's take plain numbers (not vectors) for now. To find difference between two numbers $a$ and $b$, we do subtraction $a-b$, which can be either positive or negative. Most of the time people only cares about the absolute value of the difference $|a-b|$, but $|a-b|$ is not convenient in mathematical manipulation. So people square the difference, $(a-b)^2$...this is where the product comes in. The formula $(a-b)^2=a^2+b^2-2ab$ suggests that the product $ab$ has its place as a measure of the difference or similarity between $a$ and $b$.