Minimize Frobenius norm with constraints

2.2k Views Asked by At

As a follow-up on my previous question, I would like to solve the following optimization problem:

$$\begin{array}{ll} \text{minimize} & \| \mathrm M \mathrm A - \mathrm B \|_F^2 - \mathrm x^H \mathrm M \mathrm y\\ \text{subject to} & \mathrm M^H \mathrm M = \mathrm I\end{array}$$

where $A$ and $B$ are $N\times L$ complex matrices. $M$ is $N\times N$ complex matrix and $x,y$ are $N\times 1$ complex vectors.

To my understanding this is equivalent to the problem

$$\max\Re\left\{tr\left(MAB^H \right)\right\}+\frac{1}{2} x^HMy \quad \text{s.t.}\quad M^HM=I$$

I am not sure how to proceed from here. Thank you for your help.

2

There are 2 best solutions below

0
On

We assume that the considered matrices and vectors are real.

It is sufficient to study the minimum of the function $f(M)=-2trace(AB^TM)-x^TMy$ where $M\in O(N)$. Since $O(N)$ is a Lie group, the tangent space in $M$ to $O(N)$ is $TS=\{MH|H\text{ is skew-symmetric}\}$. Then $D_Mf:K\rightarrow -2trace(AB^TK)-x^TKy=-trace((2AB^T+yx^T)K)$ and we seek $M\in O(N)$ s.t., for every $K\in TS$, $D_Mf(K)=0$, that is, for every skew-symmetric $H$, $-trace((2AB^T+yx^T)MH)=0$. Let $U=2AB^T+yx^T$.

EDIT . Therefore $UM$ is orthogonal to the set of skew-symmetric matrices, that is $UM=\Sigma$ where $\Sigma$ is symmetric. Thus $U=\Sigma M^T$ and $UU^T=\Sigma ^2$. We may assume that $UU^T=\operatorname{diag}(\lambda_1,\cdots\lambda_N)$ where $\lambda_i\geq 0$.

For the sake of simplicity, assume that $\lambda_1>\cdots>\lambda_{p+1}=\cdots=\lambda_N=0$.

Then $\Sigma=\operatorname{diag}(\pm\sqrt{\lambda_1},\cdots,\pm\sqrt{\lambda_p},0_{N-p})=\operatorname{diag}(D,0)$ ($2^p$ possible values for $\Sigma$). Then, putting $U=\begin{pmatrix}E&F\\0&0\end{pmatrix}$ with $EE^T+FF^T=D$ and $M^T=\begin{pmatrix}P&Q\\R&S\end{pmatrix}$ with $PP^T+QQ^T=I,\cdots$, the relation $U=\Sigma M^T$ is equivalent to $E=DP,F=DQ$, relations that are consistent with the two above. Finally, $P,Q$ are fixed by the choice of $\Sigma$ and $R,S$ are "arbitrary" (s.t. $M^T$ is orthogonal). It remains to consider all the obtained candidates and compare the associated values of $f(M)$.

Post scriptum. Many thanks to the OP, who has not even had the courtesy to report if he had read the two answers to his question (more than two years ago).

1
On

To use a standard Lagrangian approach, and still assuming that everything is real, since in the contrary case, $x^HMy$ is complex and can not be maximized, and if it is replaced by $Re(x^HMy)$, then one may forget the complex structure and double the dimension from complex to real,...

... the Lagrange function is \begin{align} L(M,\Lambda)&=∥MA−B∥^2_F−x^TMy-tr(Λ(M^TM-I))\\ &=∥A∥^2_F+∥B∥^2_F-2tr(MAB^T)−tr(yx^TM)-tr(Λ(M^TM-I)) %tr(ΛM^TX+ΛX^TM) \end{align} with derivative $$ 0=\frac{∂L}{∂M}=-2AB^T-yx^T-(Λ+Λ^T)M^T $$ or $$ M(Λ+Λ^T)=-2BA^T-xy^T $$ There is no unique solution to this problem, one special solution can be found by noting that if $(Λ+Λ^T)$ is positive definite, then the left side is the polar decomposition of the right side. Which again can be computed using the SVD of the right side.