How to solve the problem of trace optimization which includes Hadamard product?

118 Views Asked by At

I have the following minimization problem, where I want to find W, \begin{align} &\min \mathrm{tr} (((W^TK)\circ(W^TK))^T((W^TK)\circ(W^TK))L)\\ &\text{s.t.} ~ W^TKHKW = I \end{align} where $\circ$ means the Hadamard product,and the L ,H and K are both semi-positive definite and symmetric. By taking the derivative of the objective function with respect to W and using the Lagrange multiplier, I get the following formula \begin{align} &2K[[L((KW)\circ(KW))]\circ(KW)]=KHKW\phi \end{align} where the $\phi$ is the Lagrange multiplier.Then, I don't know how to go on.Or is it impossible to find the W I want? I'd really appreciate it for any help.

1

There are 1 best solutions below

5
On

$ \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\unvc#1{\op{unvec}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\o{{\tt1}} \def\p{\partial} \def\l{\lambda} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\B{B^{-1}} $Define the following matrix variables in terms of $(H,K,L)$ $$\eqalign{ X &= KW &\qiq X^T = W^TK \\ Y &= X\odot X &\qiq Y^T = X^T\odot X^T \\ H &= C^TC &\qiq \big\{{\rm Cholesky}\big\} \\ Z &= CX &\qiq X = C^{-1}Z \\ }$$ Starting with an unconstrained matrix $U$, construct the semi-orthogonal matrix $Z$ $$\eqalign{ B &= B^T = \LR{U^TU}^{1/2} \\ Z &= U\B \qiq &Z^TZ &= \B U^TU B^{-1} = \B B^2 \B = I \\ && &\doteq X^TC^TCX = W^TKHKW \\ }$$

Now calculate differentials with respect to the unconstrained variable $$\eqalign{ B^2 &= U^TU \\ B\,dB+dB\,B &= U^TdU+dU^TU \\ (I\otimes B+B\otimes I)\,db &= \LR{I\otimes U^T+(U^T\otimes I)M}du \\ db &= P\,du \\ \\ Z &= UB^{-1} \\ dZ &= dU\,\B - UB^{-1}dB\,\B \\ &= dU\,\B - Z\,dB\,\B \\ dz &= \LR{\B\otimes I)\,du \;-\; (\B\otimes Z}\,\c{db} \\ &= \LR{(\B\otimes I) - (\B\otimes Z)\c{P}}\,\c{du} \\ &= Q\,du \\ \\ X &= C^{-1}Z \\ dX &= C^{-1}dZ \\ dx &= \LR{I\otimes C^{-1}}\c{dz} \\ &= \LR{I\otimes C^{-1}}\c{Q\,du} \\ }$$ where $M$ is the Commutation Matrix associated with the vectorization of matrix equations.

Finally, calculate the gradient of the objective function $$\eqalign{ \phi &= \trace{YY^TL} \\ &= L:YY^T \\ d\phi &= L : \LR{dY\,Y^T+Y\,dY^T} \\ &= 2L : dY\,Y^T \\ &= 2LY : dY \\ &= 2LY : \LR{2X\odot dX} \\ &= 4\LR{X\odot LY} : dX \\ &= 4\c{\vc{X\odot LY}} : dx \\ &= 4\:\c{v} : dx \\ &= 4\:v : \LR{I\otimes C^{-1}}Q\,du \\ &= 4\:Q^T\LR{I\otimes C^{-1}}^Tv : du \\ \grad{\phi}{u} &= 4\:Q^T\LR{I\otimes C^{-1}}^Tv \\ }$$ Using this gradient, you can attempt a closed-form solution for the zero gradient condition, or you can use the gradient expression in a gradient descent algorithm. The former is so complicated that it is likely impossible, while the latter is very simple.

Note that $Z$ depends on the direction of $U$ not its length. Thus every iteration should renormalize $U$ to prevent numerical overflows.

After calculating the optimal $u$ vector, the corresponding $W$ matrix can be reconstructed $$\eqalign{ U &= \unvc{u} \\ Z &= U\LR{U^TU}^{-1/2} \\ X &= C^{-1}Z \\ W &= K^{-1}X \\ \\ }$$


In the above, a colon denotes the Frobenius product, which is incredibly useful in Matrix Calculus $$\eqalign{ A:Z &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}Z_{ij} \;=\; \trace{A^TZ} \\ A:A &= \big\|A\big\|^2_{\c F} \qquad \big\{{\rm\c{Frobenius}\:norm}\big\} \\ }$$ This is also called the double-dot or double contraction product.
When applied to vectors $(n=\o)$ it reduces to the standard dot product.

The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different ways, e.g. $$\eqalign{ A:Z &= Z:A \\ A:Z &= A^T:Z^T \\ B:\LR{A^TZ} &= \LR{BZ^T}:A^T &= \LR{AB}:Z \\ }$$