Maximization of $tr[U^HAU(U^HBU)^{-1}]$

134 Views Asked by At

I am looking for the solution of \begin{align*} \max_{{U}\in\mathbb{C}^{M\times P},\ {U}^H{U}={I}_P } &tr\left[ {U}^H{A}{U}\left({U}^H{B}{U}\right)^{-1}\right], \end{align*} where $P\leq M$, ${A} \in \mathbb{C}^{M\times M}$ and ${B} \in \mathbb{C}^{M\times M}$ are full-rank Hermitian matrices.

I wonder if the solution is a collection of the $P$ generalized eigenvectors associated to the $P$ maximal generalized eigenvalues of the matrix pencil $({A},{B})$. Alternatively, ${U}$ would be a collection of the $P$ eigenvectors associated to the $P$ maximal eigenvalues of the matrix ${B}^{-1}{A}$.

I can only prove it in the case $P=1$, in which case the problem becomes the well-known generalized Rayleigh quotient \begin{align*} \max_{{u}\in\mathbb{C}^{M\times 1},\ {u}^H{u}=1 } & \frac{u^HAu}{u^HBu}, \end{align*} which is maximized by using the dominant generalized eigenvector of $({A},{B})$.

What about the general case $P>1$?

2

There are 2 best solutions below

4
On

We can solve this for real symmetric matrices, but I'm not sure if the method works all the way for Hermitian matrices. Anyway, we can use Lagrange multipliers method to solve this problem. So, in the real symmetric case the Lagrangian can be written as

$$L = tr\left[ {U}^T{A}{U}\left({U}^T{B}{U}\right)^{-1}\right] - tr \left[ \left(U^T U - I\right) \Lambda \right]$$

where $\Lambda \in \mathbb{R}^{P \times P}$ is symmetric and contains the Lagrange multipliers. To see where the second part comes from, consider the constraint equation $U^TU=I$, which has $P^2$ equations to be satisfied (of which $P^2 - P(P+1)/2$ are redundant). Multiplying each one with $\Lambda_{ij}$ and summing all of them yields

$$ \textbf{1}^T \left[ \left(U^T U - I\right) \circ \Lambda \right] \textbf{1} = tr \left[ \left(U^T U - I\right) \Lambda \right] $$ where $\circ$ is the Hadamard product and $\textbf{1}$ is the vector with all ones. We used the properties of Hadamard product to obtain the result.

We can use the differential form of $L$ w.r.t. $U$ to solve the problem as follows:

$$\begin{align} dL &= tr\left[ d \left({U}^T{A}{U}\right) \left({U}^T{B}{U}\right)^{-1}\right] + tr\left[ {U}^T{A}{U} ~ d \left(\left({U}^T{B}{U}\right)^{-1}\right)\right] - tr \left[ d\left(U^T U \Lambda\right) \right] \\ &= 2 tr\left[\left({U}^T{B}{U}\right)^{-1} {U}^T{A}(dU)\right] - 2 tr \left[ \left({U}^T{B}{U}\right)^{-1} {U}^T{A}{U} \left({U}^T{B}{U}\right)^{-1} {U}^T{B}(dU)\right] \\ &\phantom{=}- 2 tr \left[ \Lambda U^T (dU) \right] \\ \frac{\partial L}{\partial U} &= 2\left({U}^T{B}{U}\right)^{-1} {U}^T{A} - 2\left({U}^T{B}{U}\right)^{-1} {U}^T{A}{U} \left({U}^T{B}{U}\right)^{-1} {U}^T{B} - 2\Lambda U^T \\ &= 0 \end{align}$$

Mutilplying with $U$ from right we get $\Lambda = 0$. So the constraint is not binding. Therefore, $U$ has to satisfy $$ U^T A = {U}^T{A}{U} \left({U}^T{B}{U}\right)^{-1} {U}^T{B} $$ If $B$ has an inverse we get $$ U^T A B^{-1} = {U}^T{A}{U} \left({U}^T{B}{U}\right)^{-1} {U}^T $$ or equivalently $$ B^{-1} A U = U \left({U}^T{B}{U}\right)^{-1} {U}^T{A}{U} $$

Note that the image of the right side of the equation is the column space of $U$. This means image of $U$ must be an invariant subspace of $B^{-1} A$.

If there are finitely many invariant subspaces, an algorithm for finding $U$ can be given as follows:

  1. Find all $P$-dimensional invariant subspaces $V_i$ of $B^{-1} A$.
  2. Use Gram–Schmidt process to construct $U_i$ such that $Im(U_i) = V_i$ and $U_i^T U_i = I$.
  3. Calculate $tr\left[ {U}_i^T{A}{U}_i\left({U}_i^T{B}{U}_i\right)^{-1}\right]$ for all $i$.
  4. Take $U = U_i$ that maximizes the trace.

Of course there might be infinitely many invariant subspaces. In this case you can try to parameterize $U$ and find the parameters that maximizes the trace. For $P=1$, the eigenvector of the largest eigenvalue gives a solution, but there might be infinitely many $U$ that gives the same result.

In general, if $B^{-1} A$ has distinct eigenvalues, then there are finitely many invariant subspaces.

I believe this method also generalizes to Hermitian matrices but you need to be careful when calculating the derivative of the trace, since it changes with complex conjugate transpose operation.

5
On

Given previous inputs of obareey, I realized the constraint is not binding and can actually be forgotten. Indeed, $U$ can be multiplied from the right by any full rank/invertible matrix without changing the cost function. Hence, we can forget about the constraint, optimize for a general matrix $W$ and "orthogonalize" at the end to find $U$ as $U=W(W^HW)^{-1/2}$. The relaxed problem is thus \begin{align*} \max_{{W}\in\mathbb{C}^{M\times P}} f(W)=tr\left[ {W}^H{A}{W}\left({W}^H{B}{W}\right)^{-1}\right]. \end{align*} Now, we can write equivalent forms of the problem. Given that $W$ can be multiplied from the right arbitrarily, we can choose $W$ such that $W^HBW=I$. Indeed, consider for instance a given $\tilde{W}$. We can "normalize" it as $W=\tilde{W}(\tilde{W}^HB\tilde{W})^{-1/2}$, so that $W^HBW=I$ and this does not affect the value of $f(W)$. Hence the problem is equivalent to \begin{align*} \max_{{W}\in\mathbb{C}^{M\times P}} f(W)=tr\left[ {W}^H{A}{W}\left({W}^H{B}{W}\right)^{-1}\right]\quad s.t.\ W^HBW=I, \end{align*} which is equivalent to \begin{align*} \max_{{W}\in\mathbb{C}^{M\times P}} tr\left[ {W}^H{A}{W}\right]\quad s.t.\ W^HBW=I. \end{align*} This problem has (one of) the canonical form(s) of a generalized eigenvalue problem, see, e.g., Optimization Form 2 in [1]. Let us extend the methodology of [1] to the complex/Hermitian case. The Lagrangian can be written \begin{align*} L&= tr\left[ {W}^H{A}{W}\right] - tr\left[\Lambda(W^HBW-I)\right], \end{align*} where $\Lambda$ is real and symmetric. $tr\left[\Lambda(W^HBW-I)\right]$ is then always real so that no real part should be taken. Moreover, using the same trick as in [Appendix B, 1], we can show that, without loss of generality, $\Lambda$ can be considered diagonal. Taking the Wirtinger derivative with respect to $W^*$ gives \begin{align*} \frac{dL}{dW^*}=\frac{1}{2}\left(\frac{dL}{d\Re(W)}+j\frac{dL}{d\Im(W)}\right)=AW-BW\Lambda, \end{align*} and setting it to zero gives the generalized eigenvalue problem \begin{align*} AW=BW\Lambda. \end{align*} Using this, the cost function becomes $tr\left[ {W}^H{A}{W}\left({W}^H{B}{W}\right)^{-1}\right]=tr\left[\Lambda\right]$. It is thus optimal to select maximal generalized eigenvalues.

The solution is thus given by choosing for $W$ the generalized eigenvectors of the pencil $(A,B)$, associated to its $P$ largest eigenvalues.

The final $U$, having orthogonal columns, can be obtained again by multiplying $W$ on the right as $U=W(W^HW)^{-1/2}$.

[1] B. Ghojogh, F. Karray, and M. Crowley, “Eigenvalue and Generalized Eigenvalue Problems: Tutorial.” arXiv, May 23, 2022. Accessed: Jul. 28, 2022. [Online]. Available: http://arxiv.org/abs/1903.11240