Given a vector $\mathbf{x} \in \mathbb{R}^n$, a scalar $r\gt 0$ and an invertible matrix $\mathbf{A} \in \mathbb{R}^{n\times n}$, I'd like to maximize one of the components $x_\alpha$ constrained by $\mathbf{x}^T\mathbf{A}^T\mathbf{A}\mathbf{x}=r^2$.
I tried to do this with tensor algebra but I'm pretty new to that and while I did get a result, I'm almost certain that it is wrong. I used some steps which I think are dodgy at best, so I have a rough idea where the problem(s) might lie, however I do not know how to fix them. Here is what I have thus far:
- rewriting $\mathbf{x}^T\mathbf{A}^T\mathbf{A}\mathbf{x}=r^2$ in Tensor notation:
$$x_i a_{ji} a_{jk} x_k = r^2$$
- Building a Lagrangian:
$$L\left(\mathbf{x},\lambda\right)=x_\alpha + \lambda \left(x_i a_{ji} a_{jk} x_k-r^2\right)$$
The $x_\alpha$ here is the component I want to eventually maximize. This is where the dodginess begins: The indices aren't balanced. I add a component of a vector to a scalar. However, since I actually want a component of a vector (effectively a scalar) in the end, and not a full vector, it seemed ok to me in this case.
- taking the gradient of $L$:
$$ \frac{\partial L}{\partial x_l} = \delta_{l\alpha}+\lambda a_{ji} a_{jk} \left( \delta_{l k} x_i + \delta_{l i} x_k \right) \\ \frac{\partial L}{\partial x_l} = \delta_{l\alpha} +\lambda a_{j l} \left( a_{j i} x_i + a_{j k} x_k \right) = 0$$
- Solving for $x_m$ (while ramping up the dodginess):
$$ \delta_{l\alpha} +\lambda a_{j l} \left( a_{j i} \delta_{i m} x_m + a_{j k} \delta_{k m} x_m \right) = 0\\ \delta_{l\alpha} +\lambda a_{j l} \left( a_{j m} + a_{j m} \right) x_m = 0 \\ x_m = -\frac{\delta_{l\alpha}}{2\lambda a_{jl} a_{jm}}\\ x_m = -\frac{1}{2\lambda a_{j\alpha} a_{j m}}$$
Am I actually allowed to divide by those matrix components like that? What about zero-components? And then, am I allowed to simplify $\frac{\delta_{i j}}{a_{j k}}=\frac{1}{a_{i k}}$?
- Plugging that into my constraint and solving for $\lambda$:
$$ x_i a_{j i} a_{j k} x_k = r^2 \\ \left(-\frac{1}{2\lambda a_{l\alpha}a_{l i}}\right)\left(-\frac{1}{2\lambda a_{m \alpha} a_{m k}}\right)a_{ji}a_{jk}=r^2\\ \frac{a_{j i}a_{j k}}{4 \lambda^2 a_{l\alpha} a_{m\alpha} a_{l i} a_{m k}}=r^2\\ \lambda=\pm\frac{1}{2r}\sqrt{\frac{a_{ji}a_{jk}}{a_{l\alpha}a_{m\alpha}a_{li}a_{mk}}}$$
- Plugging that back into $x_\beta$
$$x_\beta = -\frac{1}{\pm\frac{1}{r}\sqrt{\frac{a_{ji}a_{jk}}{a_{l\alpha}a_{m\alpha}a_{li}a_{mk}}} a_{n\alpha} a_{n\beta}}\\ x_\beta=\mp\frac{r}{a_{n\alpha} a_{n\beta}}\sqrt{\frac{a_{l\alpha}a_{m\alpha}a_{li}a_{mk}}{a_{ji}a_{jk}}}$$
- Sanity check, using $\alpha = \beta$ and $\mathbf{A}= \delta_{ij}$:
$$x_\alpha=\mp\frac{r}{\delta_{n\alpha}^2}\sqrt{\frac{\delta_{l\alpha}\delta_{m\alpha}\delta_{li}\delta_{mk}}{\delta_{ji}\delta_{jk}}}\\ x_\alpha=\mp\frac{r}{\delta_{nn}}\sqrt{\frac{\delta_{lm}\delta_{li}\delta_{mk}}{\delta_{ik}}}\\ x_\alpha=\mp\frac{r}{\delta_{nn}}\sqrt{\frac{\delta_{ik}}{\delta_{ik}}}\\ x_\alpha=\mp\frac{r}{\delta_{nn}}$$
This result is obviously incorrect: if $\mathbf{A}$ is the identity matrix, the result should simply be $x_\alpha=\pm r$ (and the other coordinates should all be $0$). Though it's so close (just incorrect by a factor dependent on the dimension of the involved vectorspace) that I must assume what I did isn't complete nonsense. So, where did I go off-rails and what would be the correct way to do this?
Also, with all those indices, I have a hard time turning that result back into usual matrix notation. If that's possible, how would that look like then?
An alternative approach, if $A$ is invertible:
First consider the case when $A=I$. Then the problem is to maximize $u^Tx$ subject to $x^T x\le r^2$, where $u$ is a fixed vector ($e_\alpha$ in your case). By Cauchy-Schwarz, the solution is $x=ru/|u|$.
For general invertible $A$, the problem is to maximize $u^Tx$ subject to $x^TA^TAx\le r^2$. Let $y=Ax$. Then the problem is to maximize $u^TA^{-1}y = (A^{-T}u)^Ty$ subject to $y^Ty\le r^2$. By the previous case, we want $y=r(A^{-T}u)/|A^{-T}u|$, and so $x=r((A^TA)^{-1}u)/|A^{-T}u|$.
So, the tensor formalism is not really needed in this particular problem.