Why is this minimizer obvious?

26 Views Asked by At

Lately I have been studying multi-linear algebra, and I'm trying to build my geometric intuition on it. A problem I'm currently faced with is to maximize $$\max_U <U, YX^T>_F$$ subject to orthonormality $U^TU=I$.

Below was given as a "solution": $$ \begin{aligned} U_{\text{tmp}} &= YX^T\\ U &= (U_{\text{tmp}}U_{\text{tmp}}^T)^{-\frac{1}{2}}U_{\text{tmp}}. \end{aligned} $$

Note that this was part of a larger tensor decomposition problem, and the "solution" might just be a "quick and dirty" solution that helps with the larger picture optimization. I'm trying to verify (as an exercise) whether or not the given solution is optimal or not. I probably could just verify this via writing out Lagrangians but that is not the point here.

What I currently know is that without the constraint, then $U=YX^T$ is optimal. This is easily seen via vectorizing both terms in the Frobenius norm, to obtain a maximization problem involving Euclidean norm, and thus the unique solution must be $\text{vec}(U) = \text{vec}(YX^T)$, thus giving $U=YX^T$ when folded back again. The second step is merely an orthonormalization step. I'm ignoring the fact that $U$ has to be square in the second step.

In the text that I was reading, this solution is treated as obvious, but I can't make it out why geometrically. I thought that this is similar to the vector case $$ \max_u <u, x>$$ subject to $u^Tu=1$. In this case $u=\frac{x}{|x|}$ is geometrically obvious to me. But that might also not be the correct analog as we have $(U_{\text{tmp}}U_{\text{tmp}}^T)^{-\frac{1}{2}}$ as opposed to $(U_{\text{tmp}}^TU_{\text{tmp}})^{-\frac{1}{2}}$ in the "solution".

A geometric explanation is appreciated. But if the answer is that writing out Lagrangians is the only way to verify it then I will also accept that.