Find the max and min of the function $ f(A)=\langle v_{0}, Av_{0}\rangle $

162 Views Asked by At

Let $X$ be the set of all orthogonal $n \times n$ matrices taking value from real numbers. Let $v_{0}\in \mathbb{R^{n}}$ then find the maximum and minimum of the inner product function $ f(A)=\langle v_{0}, Av_{0}\rangle $ where $A\in X$. How can I find the critical points? How can I approach the problem? Thanks in advance.

2

There are 2 best solutions below

5
On

Here's a short solution, followed by how to think about the problem in order to discover the solution.

The answer is that, for any $v$, the maximum value of $\langle v, Av\rangle$ is $||v||^2$ and the minimum value is $-||v||^2$.

Indeed, we know that for any vector $w$, $\langle v, w\rangle = ||v||\,||w||\,\cos{\theta_{vw}}$. In particular, when $w \equiv Av$, we know that

$$\langle v, Av\rangle = ||v||\,||Av||\, \cos{\theta_{v, Av}}$$

and because $A$ is orthogonal, it doesn't change vector lengths: $||Av|| = ||v||$. Hence

$$\langle v, Av\rangle = ||v||^2 \cos{\theta_{v, Av}}.$$

This tells us that for any orthogonal matrix, we cannot get a value higher than $||v||^2$ or lower than $-||v||^2$. The question is whether we can find orthogonal matrices which actually attain these maximum and minimum values, with corresponding angles of $\theta = 0$ and $\theta = \pi$.

But indeed, the identity matrix $I$ is an orthogonal matrix which makes $\theta = 0$, and the opposite of the identity $-I$ is an orthogonal matrix which makes $\theta = \pi$. These orthogonal matrices will always make $\langle v, Av\rangle$ achieve its maximum and minimum possible values, respectively.


I believe it helps to think about orthogonal matrices spatially/geometrically, and in terms of eigenvectors.

  1. For any vector $v$, the maximum dot product $\langle v, Av\rangle$ occurs when $v$ and $Av$ are parallel, or as close to parallel as possible.

  2. We can always find a matrix $A$ for which $v$ and $Av$ are parallel. (For example, let $A$ be the identity matrix.)

  3. Note that this is the same as saying that we can always find a matrix $A$ for which $v$ is an eigenvector (with positive eigenvalue). (Two vectors are parallel if one is a multiple of another; in particular, $Av$ is parallel to $v$ if $Av = \alpha v$ for some $\alpha$; this makes $v$ an eigenvector of $A$.)
  4. Because we are restricted to searching through the space of orthogonal matrices $A$, we actually know something about the eigenvectors and eigenvalues. Orthogonal matrices have the special property that the only eigenvalues are $\pm 1$. Hence not only do we know that $A$ must have $v$ as an eigenvector; we know that its eigenvalue must be $+1$. Hence the maximum dot product occurs when $Av=+1\cdot v$. In this case, $\langle v, Av\rangle = \langle v, v\rangle = ||v||^2.$ This is the maximum possible value of $\langle v, Av\rangle$.

And by analogy, we can reason about a minimizing orthogonal matrix $B$:

  1. The minimum dot product should occur when $v$ and $Bv$ are as close to anti-parallel as possible, i.e. where $Bv$ is as close to $-v$ as possible.
  2. We can always find a matrix $B$ for which $v$ and $Bv$ are antiparallel. (For example, let $B$ be the opposite of the identity matrix $B = -I$.)
  3. Note that this is the same as saying that we can always find a matrix $B$ for which $v$ is an eigenvector with a negative eigenvalue.
  4. Because we are restricted to searching through the space of orthogonal values, the only possible eigenvalues are $\pm 1$. Hence not only do we know that $B$ must have $v$ as an eigenvector, we know that its eigenvalue must be $-1$. Hence the minimum dot product occurs when $\langle v, Bv\rangle = \langle v, -v\rangle = - ||v||^2$.

In short, the answer is that for fixed $v$, the maximum possible value of $\langle v, Av\rangle$ occurs when $v$ is an eigenvector of $A$ with eigenvalue +1, in which case $\langle v, Av\rangle = ||v||^2$. The minimum possible value of $\langle v, Bv\rangle$ occurs when $v$ is an eigenvector of $B$ with eigenvalue -1, in which case $\langle v, Bv\rangle = -||v||^2$.

It's always possible to find a matrix $A$ for which $v$ is an eigenvector with eigenvalue $+1$; just take $A$ to be the identity matrix $I$, for example. And it's always possible to find a matrix $B$ for which $v$ is an eigenvector with eigenvalue $-1$; just take $B$ to be the opposite of the identity matrix $B = -I$, for example.


Edit: If you really wanted to solve this problem using calculus, here's one way to do it.

Fix a vector $\vec{v}\in \mathbb{R}^{n}$ and consider the function $f:\mathbb{R}^{n\times n}\rightarrow \mathbb{R}$ given by

$$f(A) = \langle v, Av\rangle$$

and the function $g:\mathbb{R}^{n\times n}\rightarrow \mathbb{R}^{n\times n}$ given by

$$g(A) = A A^\top - I.$$

At the most basic level, $f$ is just a scalar function of real numbers, and $g$ is a vector-valued function of real numbers. So, we can use the standard method of Lagrange multipliers to solve the problem:

Find the maximum and minimum of $f$, subject to the constraint $g \equiv \vec{0}$.

The Lagrangian is

$$\mathcal{L}(A, \Lambda) = f(A) - \sum_{i,j} \Lambda_{i,j} g_{i,j}(A)$$

where $\Lambda$ consists of $n\times n$ Lagrange multipliers (in a square matrix, if you like).

Now we search for a stationary point of $\mathcal{L}$ by computing its various derivatives.

For any entry $a_{i,j}$ in $A$, we find the derivative

$$\frac{\partial \mathcal{L}}{\partial a_{i,j}} = v_i v_j - \sum_{k=1}^n (\Lambda_{i,k} + \Lambda_{k,j})A_{k,j} $$

and for any Lagrange multiplier $\lambda_{i,j}$ in $\Lambda$, we find the derivative

$$\frac{\partial \mathcal{L}}{\partial \lambda_{i,j}} = -[i=j] + \sum_{k=1}^n A_{i,k}A_{j,k}$$

Note that these $\lambda_{i,j}$-derivatives will be zero just when the rows of $A$ are orthonormal! (It says that the dot product of the $i$th and $j$th row of $A$ is 0 if $i\neq j$ and 1 if $i=j$.)

Now, matrix notation gives us a beautiful shorthand for these constraints:

$$\begin{align*} \frac{\partial \mathcal{L}}{\partial A} &= vv^\top - (\Lambda + \Lambda^\top) \cdot A\\ \frac{\partial \mathcal{L}}{\partial \Lambda} &= AA^\top - I\\ \end{align*}$$

Now we look for ways to make all of these derivatives vanish. We can make the derivatives with respect to $\lambda_{i,j}$ vanish by having the matrix $A$ be orthogonal, of course. This does not constrain the values of $\Lambda$ in any way.

Suppose $\partial \mathcal{L}/\partial \Lambda$ vanishes; i.e. that $A$ is orthogonal. Then $A$ is invertible and $AA^\top = I$. Under this constraint, we find that $\partial \mathcal{L}/\partial A$ vanishes just if:

$$\begin{align*} vv^\top - (\Lambda + \Lambda^\top)A &= 0\\ vv^\top &= (\Lambda + \Lambda^\top)A\\ vv^\top A^\top &= (\Lambda + \Lambda^\top)AA^\top\\ v(Av)^\top &= (\Lambda + \Lambda^\top)\\ \end{align*}$$

As user Batominovski points out in the comments, $(\Lambda + \Lambda^\top)$ constitutes a symmetric matrix, and we can show that $v(Av)^\top$ is symmetric if and only if $Av$ is parallel to $v$.

Hence the stationary points occur where $A$ is an orthogonal matrix such that $v$ is an eigenvector (in which case the eigenvalue must be $\pm 1$). From this, it follows that $(\Lambda + \Lambda^\top) = \pm vv^\top$, and we can obtain the value of $\langle v, Av\rangle$ as simply $\mathrm{trace}{(\Lambda+\Lambda^\top)} = \pm ||v||^2$.

For example, we can simply propose a solution (perhaps working backwards from our other methods of finding the answer) and show that it works.

For example, consider what happens when $A = I$ and $\Lambda = \frac{1}{2}vv^\top$. Then all of the derivatives vanish because $A=I$ is orthogonal and because

$$\begin{align*} \frac{\partial \mathcal L}{\partial A} &= vv^\top - (\Lambda + \Lambda^\top)A&\\ &= vv^\top - (\Lambda + \Lambda^\top)& A\text{ is the identity matrix}\\ &= v_iv_j - (\frac{1}{2}vv^\top + \frac{1}{2}vv^\top ) & \Lambda = \Lambda^\top = vv^\top \\ &= 0\\ \end{align*}$$

and so on in this same way for $A=-I$.

1
On

By the Cauchy-Schwarz Inequality, $$\big|f(A)\big|=\big|\left\langle v_0,A\,v_0\right\rangle\big|\leq \left\|v_0\right\|_2\,\left\|A\,v_0\right\|_2\,,$$ where $\|\_\|_2$ is the $2$-norm induced by the inner product $\langle\_,\_\rangle$. As $A$ is an orthogonal matrix, $\left\|A\,v_0\right\|_2=\left\|v_0\right\|_2$. Thus, $\big|f(A)\big|\leq \left\|v_0\right\|_2^2$, or $$-\left\|v_0\right\|_2^2\leq f(A)\leq +\left\|v_0\right\|_2^2\,.$$ Clearly, both inequalities are sharp, as $f(\pm I) =\pm\left\|v_0\right\|_2^2$, where $I\in X$ is the $n$-by-$n$ identity matrix.

In fact, $f(A)=+\left\|v_0\right\|_2^2$ if and only if $v_0$ is an eigenvector of $A$ corresponding to the eigenvalue $+1$. Likewise, $f(A)=-\left\|v_0\right\|_2^2$ if and only if $v_0$ is an eigenvector of $A$ corresponding to the eigenvalue $-1$.

P.S.: This is more-or-less user326210's solution, but more succinct.