Let $X$ be the set of all orthogonal $n \times n$ matrices taking value from real numbers. Let $v_{0}\in \mathbb{R^{n}}$ then find the maximum and minimum of the inner product function $ f(A)=\langle v_{0}, Av_{0}\rangle $ where $A\in X$. How can I find the critical points? How can I approach the problem? Thanks in advance.
Find the max and min of the function $ f(A)=\langle v_{0}, Av_{0}\rangle $
162 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
By the Cauchy-Schwarz Inequality, $$\big|f(A)\big|=\big|\left\langle v_0,A\,v_0\right\rangle\big|\leq \left\|v_0\right\|_2\,\left\|A\,v_0\right\|_2\,,$$ where $\|\_\|_2$ is the $2$-norm induced by the inner product $\langle\_,\_\rangle$. As $A$ is an orthogonal matrix, $\left\|A\,v_0\right\|_2=\left\|v_0\right\|_2$. Thus, $\big|f(A)\big|\leq \left\|v_0\right\|_2^2$, or $$-\left\|v_0\right\|_2^2\leq f(A)\leq +\left\|v_0\right\|_2^2\,.$$ Clearly, both inequalities are sharp, as $f(\pm I) =\pm\left\|v_0\right\|_2^2$, where $I\in X$ is the $n$-by-$n$ identity matrix.
In fact, $f(A)=+\left\|v_0\right\|_2^2$ if and only if $v_0$ is an eigenvector of $A$ corresponding to the eigenvalue $+1$. Likewise, $f(A)=-\left\|v_0\right\|_2^2$ if and only if $v_0$ is an eigenvector of $A$ corresponding to the eigenvalue $-1$.
P.S.: This is more-or-less user326210's solution, but more succinct.
Here's a short solution, followed by how to think about the problem in order to discover the solution.
The answer is that, for any $v$, the maximum value of $\langle v, Av\rangle$ is $||v||^2$ and the minimum value is $-||v||^2$.
Indeed, we know that for any vector $w$, $\langle v, w\rangle = ||v||\,||w||\,\cos{\theta_{vw}}$. In particular, when $w \equiv Av$, we know that
$$\langle v, Av\rangle = ||v||\,||Av||\, \cos{\theta_{v, Av}}$$
and because $A$ is orthogonal, it doesn't change vector lengths: $||Av|| = ||v||$. Hence
$$\langle v, Av\rangle = ||v||^2 \cos{\theta_{v, Av}}.$$
This tells us that for any orthogonal matrix, we cannot get a value higher than $||v||^2$ or lower than $-||v||^2$. The question is whether we can find orthogonal matrices which actually attain these maximum and minimum values, with corresponding angles of $\theta = 0$ and $\theta = \pi$.
But indeed, the identity matrix $I$ is an orthogonal matrix which makes $\theta = 0$, and the opposite of the identity $-I$ is an orthogonal matrix which makes $\theta = \pi$. These orthogonal matrices will always make $\langle v, Av\rangle$ achieve its maximum and minimum possible values, respectively.
I believe it helps to think about orthogonal matrices spatially/geometrically, and in terms of eigenvectors.
For any vector $v$, the maximum dot product $\langle v, Av\rangle$ occurs when $v$ and $Av$ are parallel, or as close to parallel as possible.
We can always find a matrix $A$ for which $v$ and $Av$ are parallel. (For example, let $A$ be the identity matrix.)
And by analogy, we can reason about a minimizing orthogonal matrix $B$:
In short, the answer is that for fixed $v$, the maximum possible value of $\langle v, Av\rangle$ occurs when $v$ is an eigenvector of $A$ with eigenvalue +1, in which case $\langle v, Av\rangle = ||v||^2$. The minimum possible value of $\langle v, Bv\rangle$ occurs when $v$ is an eigenvector of $B$ with eigenvalue -1, in which case $\langle v, Bv\rangle = -||v||^2$.
It's always possible to find a matrix $A$ for which $v$ is an eigenvector with eigenvalue $+1$; just take $A$ to be the identity matrix $I$, for example. And it's always possible to find a matrix $B$ for which $v$ is an eigenvector with eigenvalue $-1$; just take $B$ to be the opposite of the identity matrix $B = -I$, for example.
Edit: If you really wanted to solve this problem using calculus, here's one way to do it.
Fix a vector $\vec{v}\in \mathbb{R}^{n}$ and consider the function $f:\mathbb{R}^{n\times n}\rightarrow \mathbb{R}$ given by
$$f(A) = \langle v, Av\rangle$$
and the function $g:\mathbb{R}^{n\times n}\rightarrow \mathbb{R}^{n\times n}$ given by
$$g(A) = A A^\top - I.$$
At the most basic level, $f$ is just a scalar function of real numbers, and $g$ is a vector-valued function of real numbers. So, we can use the standard method of Lagrange multipliers to solve the problem:
The Lagrangian is
$$\mathcal{L}(A, \Lambda) = f(A) - \sum_{i,j} \Lambda_{i,j} g_{i,j}(A)$$
where $\Lambda$ consists of $n\times n$ Lagrange multipliers (in a square matrix, if you like).
Now we search for a stationary point of $\mathcal{L}$ by computing its various derivatives.
For any entry $a_{i,j}$ in $A$, we find the derivative
$$\frac{\partial \mathcal{L}}{\partial a_{i,j}} = v_i v_j - \sum_{k=1}^n (\Lambda_{i,k} + \Lambda_{k,j})A_{k,j} $$
and for any Lagrange multiplier $\lambda_{i,j}$ in $\Lambda$, we find the derivative
$$\frac{\partial \mathcal{L}}{\partial \lambda_{i,j}} = -[i=j] + \sum_{k=1}^n A_{i,k}A_{j,k}$$
Note that these $\lambda_{i,j}$-derivatives will be zero just when the rows of $A$ are orthonormal! (It says that the dot product of the $i$th and $j$th row of $A$ is 0 if $i\neq j$ and 1 if $i=j$.)
Now, matrix notation gives us a beautiful shorthand for these constraints:
$$\begin{align*} \frac{\partial \mathcal{L}}{\partial A} &= vv^\top - (\Lambda + \Lambda^\top) \cdot A\\ \frac{\partial \mathcal{L}}{\partial \Lambda} &= AA^\top - I\\ \end{align*}$$
Now we look for ways to make all of these derivatives vanish. We can make the derivatives with respect to $\lambda_{i,j}$ vanish by having the matrix $A$ be orthogonal, of course. This does not constrain the values of $\Lambda$ in any way.
Suppose $\partial \mathcal{L}/\partial \Lambda$ vanishes; i.e. that $A$ is orthogonal. Then $A$ is invertible and $AA^\top = I$. Under this constraint, we find that $\partial \mathcal{L}/\partial A$ vanishes just if:
$$\begin{align*} vv^\top - (\Lambda + \Lambda^\top)A &= 0\\ vv^\top &= (\Lambda + \Lambda^\top)A\\ vv^\top A^\top &= (\Lambda + \Lambda^\top)AA^\top\\ v(Av)^\top &= (\Lambda + \Lambda^\top)\\ \end{align*}$$
As user Batominovski points out in the comments, $(\Lambda + \Lambda^\top)$ constitutes a symmetric matrix, and we can show that $v(Av)^\top$ is symmetric if and only if $Av$ is parallel to $v$.
Hence the stationary points occur where $A$ is an orthogonal matrix such that $v$ is an eigenvector (in which case the eigenvalue must be $\pm 1$). From this, it follows that $(\Lambda + \Lambda^\top) = \pm vv^\top$, and we can obtain the value of $\langle v, Av\rangle$ as simply $\mathrm{trace}{(\Lambda+\Lambda^\top)} = \pm ||v||^2$.
For example, we can simply propose a solution (perhaps working backwards from our other methods of finding the answer) and show that it works.
For example, consider what happens when $A = I$ and $\Lambda = \frac{1}{2}vv^\top$. Then all of the derivatives vanish because $A=I$ is orthogonal and because
$$\begin{align*} \frac{\partial \mathcal L}{\partial A} &= vv^\top - (\Lambda + \Lambda^\top)A&\\ &= vv^\top - (\Lambda + \Lambda^\top)& A\text{ is the identity matrix}\\ &= v_iv_j - (\frac{1}{2}vv^\top + \frac{1}{2}vv^\top ) & \Lambda = \Lambda^\top = vv^\top \\ &= 0\\ \end{align*}$$
and so on in this same way for $A=-I$.