maximizing a coordinate of $x^T A^T A x \leq r^2$

222 Views Asked by At

Given a vector $\mathbf{x} \in \mathbb{R}^n$, a scalar $r\gt 0$ and an invertible matrix $\mathbf{A} \in \mathbb{R}^{n\times n}$, I'd like to maximize one of the components $x_\alpha$ constrained by $\mathbf{x}^T\mathbf{A}^T\mathbf{A}\mathbf{x}=r^2$.

I tried to do this with tensor algebra but I'm pretty new to that and while I did get a result, I'm almost certain that it is wrong. I used some steps which I think are dodgy at best, so I have a rough idea where the problem(s) might lie, however I do not know how to fix them. Here is what I have thus far:

  1. rewriting $\mathbf{x}^T\mathbf{A}^T\mathbf{A}\mathbf{x}=r^2$ in Tensor notation:

$$x_i a_{ji} a_{jk} x_k = r^2$$

  1. Building a Lagrangian:

$$L\left(\mathbf{x},\lambda\right)=x_\alpha + \lambda \left(x_i a_{ji} a_{jk} x_k-r^2\right)$$

The $x_\alpha$ here is the component I want to eventually maximize. This is where the dodginess begins: The indices aren't balanced. I add a component of a vector to a scalar. However, since I actually want a component of a vector (effectively a scalar) in the end, and not a full vector, it seemed ok to me in this case.

  1. taking the gradient of $L$:

$$ \frac{\partial L}{\partial x_l} = \delta_{l\alpha}+\lambda a_{ji} a_{jk} \left( \delta_{l k} x_i + \delta_{l i} x_k \right) \\ \frac{\partial L}{\partial x_l} = \delta_{l\alpha} +\lambda a_{j l} \left( a_{j i} x_i + a_{j k} x_k \right) = 0$$

  1. Solving for $x_m$ (while ramping up the dodginess):

$$ \delta_{l\alpha} +\lambda a_{j l} \left( a_{j i} \delta_{i m} x_m + a_{j k} \delta_{k m} x_m \right) = 0\\ \delta_{l\alpha} +\lambda a_{j l} \left( a_{j m} + a_{j m} \right) x_m = 0 \\ x_m = -\frac{\delta_{l\alpha}}{2\lambda a_{jl} a_{jm}}\\ x_m = -\frac{1}{2\lambda a_{j\alpha} a_{j m}}$$

Am I actually allowed to divide by those matrix components like that? What about zero-components? And then, am I allowed to simplify $\frac{\delta_{i j}}{a_{j k}}=\frac{1}{a_{i k}}$?

  1. Plugging that into my constraint and solving for $\lambda$:

$$ x_i a_{j i} a_{j k} x_k = r^2 \\ \left(-\frac{1}{2\lambda a_{l\alpha}a_{l i}}\right)\left(-\frac{1}{2\lambda a_{m \alpha} a_{m k}}\right)a_{ji}a_{jk}=r^2\\ \frac{a_{j i}a_{j k}}{4 \lambda^2 a_{l\alpha} a_{m\alpha} a_{l i} a_{m k}}=r^2\\ \lambda=\pm\frac{1}{2r}\sqrt{\frac{a_{ji}a_{jk}}{a_{l\alpha}a_{m\alpha}a_{li}a_{mk}}}$$

  1. Plugging that back into $x_\beta$

$$x_\beta = -\frac{1}{\pm\frac{1}{r}\sqrt{\frac{a_{ji}a_{jk}}{a_{l\alpha}a_{m\alpha}a_{li}a_{mk}}} a_{n\alpha} a_{n\beta}}\\ x_\beta=\mp\frac{r}{a_{n\alpha} a_{n\beta}}\sqrt{\frac{a_{l\alpha}a_{m\alpha}a_{li}a_{mk}}{a_{ji}a_{jk}}}$$

  1. Sanity check, using $\alpha = \beta$ and $\mathbf{A}= \delta_{ij}$:

$$x_\alpha=\mp\frac{r}{\delta_{n\alpha}^2}\sqrt{\frac{\delta_{l\alpha}\delta_{m\alpha}\delta_{li}\delta_{mk}}{\delta_{ji}\delta_{jk}}}\\ x_\alpha=\mp\frac{r}{\delta_{nn}}\sqrt{\frac{\delta_{lm}\delta_{li}\delta_{mk}}{\delta_{ik}}}\\ x_\alpha=\mp\frac{r}{\delta_{nn}}\sqrt{\frac{\delta_{ik}}{\delta_{ik}}}\\ x_\alpha=\mp\frac{r}{\delta_{nn}}$$

This result is obviously incorrect: if $\mathbf{A}$ is the identity matrix, the result should simply be $x_\alpha=\pm r$ (and the other coordinates should all be $0$). Though it's so close (just incorrect by a factor dependent on the dimension of the involved vectorspace) that I must assume what I did isn't complete nonsense. So, where did I go off-rails and what would be the correct way to do this?

Also, with all those indices, I have a hard time turning that result back into usual matrix notation. If that's possible, how would that look like then?

2

There are 2 best solutions below

2
On BEST ANSWER

An alternative approach, if $A$ is invertible:

First consider the case when $A=I$. Then the problem is to maximize $u^Tx$ subject to $x^T x\le r^2$, where $u$ is a fixed vector ($e_\alpha$ in your case). By Cauchy-Schwarz, the solution is $x=ru/|u|$.

For general invertible $A$, the problem is to maximize $u^Tx$ subject to $x^TA^TAx\le r^2$. Let $y=Ax$. Then the problem is to maximize $u^TA^{-1}y = (A^{-T}u)^Ty$ subject to $y^Ty\le r^2$. By the previous case, we want $y=r(A^{-T}u)/|A^{-T}u|$, and so $x=r((A^TA)^{-1}u)/|A^{-T}u|$.

So, the tensor formalism is not really needed in this particular problem.

1
On

Note that $x^T A^T A x = (Ax)^T (Ax) = \lVert Ax \rVert^2$, so this is expressible as a norm. We expect the derivative to be something like $2A^T Ax$, which is what you have in the line $$ \frac{\partial L}{\partial x_l} = \delta_{\alpha l} + \lambda a_{jl}(a_{ji} x_i + a_{jk}x_k): $$ changing the indices, this is $$ \frac{\partial L}{\partial x_l} = \delta_{\alpha l} + 2\lambda a_{jl}a_{ji} x_i \tag{1} $$ Now, we want to solve for this being zero. Your key mistake is that we are summing over $i$ and $j$ in this formula, so you can't just divide by the matrix. Probably the easiest thing to do is jump back into vector notation, and then we can see what is going on. (1) is essentially the $l$th component of a vector equation. The delta is therefore the unit vector in the $\alpha$th direction, $e_{\alpha}$. The second term is $(A^T A x)_l$ using the usual rules of matrix algebra, and hence what you have to do is solve the matrix equation $$ A^T A x = e_{\alpha}. $$ If $A^T A$ is invertible, you obviously get the single point you want. If it is not invertible, the situation is worse, because the set $\lVert Ax \rVert^2=r^2$ is unbounded, and possibly there is no maximum (take $a_{11}=0 $ and $a_{12}=a_{21}=a_{22}=0$ for one pathalogical example).