How to know what is the correct value to choose

53 Views Asked by At

I have been self studying linear algebra from the textbook Linear Algebra Done Right, in preparation for upper division math classes, and I recently came across a problem that I struggle to see the general algorithm for solving, so I was hoping someone could help me understand that algorithm.

Here is the problem. This is from the exercises in chapter 6.A of LADR.

enter image description here

The forward direction, I could prove. However I struggled immensely with the converse and eventually gave up and looked at a solution manual. Here was the solution given.

enter image description here

The logic of this solution makes perfect sense to me. I can understand why it proves the conclusion and I’ve checked to see that the given value of a does indeed get the right result. But what I don’t understand is this: how could I have discovered for myself, in a methodical way, that this particular value of a is the appropriate value to plug in to get the desired result? From my perspective it seems like the given value for a just fell out of the sky in a moment of serendipity. What could I have done to discover it for myself instead of having it handed to me?

1

There are 1 best solutions below

1
On

This solution as presented is definitely hard to "discover" all in one go. To solve this problem, you could first think about the two-dimensional case (in fact you can reduce to that, by working in the span of $u$ and $v$). Most of my answer will be presented as a proof by contraposition - ie we're assuming $\langle u, v\rangle \ne 0$ and trying to prove there's some $a$ such that $\lVert u + av\rVert < \lVert u \rVert$.

To get a bit of intuition, let's just think about the case of $\Bbb R^2$ with the dot product (in fact any real two-dimensional inner product space is isomorphic to this space, but we won't use this). Then what you want to prove is "if $u$ and $v$ are not perpendicular, then some point on the line $L = \{u + av: a \in \Bbb R\}$ is closer to $0$ than $u$". In other words, the line goes into the interior of the circle with centre $0$ and radius $\lVert u\rVert$. Hopefully you believe this geometrically, because the tangent to a circle is perpendicular to the radius! I think that's basically what this question is about.

By this observation, we realise that if $u$ and $v$ are not perpendicular, then perturbing a little bit in the direction of either $v$ or $-v$ should get us closer to the origin. Let's see how far we get with the algebra! A general "trick" that will help us with the algebra to prove this is to work with squared norms instead of norms.

So let's calculate. $\lVert u + av \rVert^2 = \lVert u \rVert^2 + a^2\lVert v \rVert^2 + 2a\langle u, v \rangle$. We want to argue that there is some $a$ such that $a^2\lVert v \rVert^2 + 2a\langle u, v \rangle$ is negative. The term $a^2\lVert v \rVert^2$ is positive, of course. So the negativity has to come from $2a\langle u, v \rangle$. Fortunately, this term will beat the other term for some appropriate small value of $a$! This is basically because $x^2$ goes to $0$ faster than $x$. We could literally leave it there - just appeal to the real analysis. However, we can find an explicit value: in this case, the condition is exactly that $a$ has opposite sign to $\langle u, v \rangle$, and that $a$ is smaller in magnitude than $2|\langle u, v \rangle|/\lVert v\rVert^2$. So, for example, $a = -\langle u, v \rangle/\lVert v\rVert^2$ does the job. (But this isn't the only value that works! The value $a = -\tfrac 1{12345}\langle u, v \rangle/\lVert v\rVert^2$ works just as well.) Geometrically, we're asserting that the midpoint of the two intersections of the line $L$ with the circle lies in the interior of the circle.

Now the complex case is of course slightly more complicated - particularly if you're trying to visualise the geometry! When we expand this time, we get $\lVert u + av \rVert^2 = \lVert u \rVert^2 + |a|^2\lVert v \rVert^2 + a\langle v, u \rangle + \overline{a \langle v, u \rangle} = \lVert u \rVert^2 + |a|^2\lVert v \rVert^2 + 2 \operatorname{Re}(a\langle v, u \rangle)$. Now, once again, for similar reasons, we can expect to make the term $|a|^2\lVert v \rVert^2 + 2 \operatorname{Re}(a\langle v, u \rangle)$ negative for suitable small $a$. There are many different values of $a$ we could pick, now - there's a whole range of arguments that could work (in fact any angle causing $a \langle v, u \rangle$ to have negative real part). Perhaps the easiest and most canonical one is to pick $a$'s argument to be the same as the negated conjugate of $\langle v, u \rangle$, as this makes $a\langle v, u \rangle$ a purely real negative number, and it looks quite similar to what we picked last time... Indeed this means $a$ should be a scaled multiple of $-\langle u, v \rangle$. We still need $a$ to be smaller in modulus than $2|\langle u, v \rangle|/\lVert v\rVert^2$. So $a = -\langle u, v \rangle/\lVert v\rVert^2$ does the job again. Once again, this is not the only possible $a$ to pick.

So we're done! Now having produced this proof in your "rough" work, you could ask yourself if it's possible to turn it into a direct proof. This working tells us that that $a = -\langle u, v \rangle/\lVert v\rVert^2$ encodes some special information about the perpendicularity of $u$ and $v$. So you can see what happens when you plug that in, and out falls the nice slick proof.

I think that simply writing down the squared norm and hoping that will get us somewhere is a step that you have to have some blind faith in. Intuition about this does come with experience about proofs like this, and hopefully the motivation about the circle is some good informal evidence that this will be a productive strategy.

I'll mention that an alternative strategy, which is even more "automatic", is to see that we're being asked to translate between a property of $\lVert \cdot \rVert$ and $\langle \cdot,\cdot\rangle$, and therefore decide that we'll translate the inner product property into a norm property, by using the polarisation identity. Indeed, $ \operatorname{Re}(\langle u, v \rangle) = \tfrac 12(\lVert u \rVert^2 + \lVert v \rVert^2 - \lVert u - v \rVert^2) $. This had better be non-negative, and indeed, by the inequality, we have $\operatorname{Re}(\langle u, v \rangle) \ge \tfrac 12 (\lVert u \rVert^2 + \lVert v \rVert^2 - \lVert u \rVert^2) \ge 0$. By similar arguments, we have $\operatorname{Re}(\langle u, v \rangle) \le 0$ and $0 \le \operatorname{Im}(\langle u, v \rangle) \le 0$. (This is quite tedious to check!)