Find the gradient (with respect to a matrix) of an expression containing a Frobenius norm and a Hadamard product.

187 Views Asked by At

I'm struggling with taking the gradient with respect to (w.r.t) the matrices $H_R$ and $H_I$ in the following expression

$$\left\| Z - I\odot(H_R^TA + H_I^TB) \right\|_F^2 + \left\|W-\begin{pmatrix} H_R & -H_I\\H_I & H_R\end{pmatrix}P_dS^T \right\|_F^2$$

where $\|\cdot\|_F^2$ represents the Frobenius norm, $\odot$ represents the Hadamard product, $(\cdot)^T$ is the transpose operation, $I$ is the identity matrix, and all the terms in the above expression have the proper dimension.

It would be very helpful to find at least some alternative on how to calculate the gradient w.r.t one of the above-mentioned matrices, say $H_R$, as the same analysis could be then extended to $H_I$. Any help would be highly appreciated.

1

There are 1 best solutions below

6
On BEST ANSWER

$\def\R#1{{\mathbb R}^{#1}}\def\v{{\rm vec}}\def\M{{\rm Reshape}}\def\m#1{\left[\begin{array}{r}#1\end{array}\right]}\def\p#1#2{\frac{\partial #1}{\partial #2}}$For ease of typing, replace the subscripted variables with single-letter names $$\eqalign{ R = H_R \qquad Q = H_I \qquad P=P_d \\ }$$ and define the matrices $$\eqalign{ I_2 &= \m{1&0\\0&1}\qquad J_2 = \m{0&-1\\1&0} \\ M &= I\odot(R^TA+Q^TB)- Z \\ N &= (I_2\otimes R+J_2\otimes Q)PS^T - W \\ D &= I\odot M \\ C &= NSP^T \\ }$$ and finally, we'll need the SVD of the last matrix $$\eqalign{ C &= \sum_{k=1}^{rank(C)} \sigma_ku_kv_k^T \\ U_k &= \M(u_k,n,2) \quad&\iff\quad u_k = \v(U_k) \\ V_k &= \M(v_k,n,2) \quad&\iff\quad v_k = \v(V_k) \\ }$$ Let's also use a colon to denote the matrix inner product, i.e. $$\eqalign{ A:B &= {\rm Tr}(A^TB) \\ A:A &= \big\|A\big\|_F^2 \\ }$$ Write the objective function using this new notation. Then calculate its differential. $$\eqalign{ \phi &= M:M + N:N \\ d\phi &= 2M:dM + 2N:dN \\ &= 2M:(I\odot (dR^TA+dQ^TB) + 2N:(I_2\otimes dR+J_2\otimes dQ)PS^T \\ &= 2D:(A^TdR+B^TdQ) + 2C:(I_2\otimes dR+J_2\otimes dQ) \\ &= 2AD:dR + 2C:(I_2\otimes dR) + 2BD:dQ - 2(J_2\otimes I_Q)C:(I_2\otimes dQ) \\ }$$ The terms containing Kronecker products are tricky. Here's where the SVD becomes useful $$\eqalign{ C:(I_2\otimes dR) &= \sum_k\sigma_ku_kv_k^T:(I_2\otimes dR) \\ &= \sum_k\sigma_ku_k:(I_2\otimes dR)v_k \\ &= \sum_k\sigma_ku_k:\v(dR\,V_kI_2) \\ &= \sum_k\sigma_kU_k:dR\,V_k \\ &= \left(\sum_k\sigma_kU_kV_k^T\right):dR \\ &= E:dR \\ }$$ Substitute the SVD result and set $dQ=0$ to obtain the gradient wrt $R$ $$\eqalign{ d\phi &= 2(AD+E):dR \\ \p{\phi}{R} &= 2(AD+E) \\ }$$ A similar calculation for the gradient wrt $Q$ is a bit tricker, but the result is $$\eqalign{ F &= \sum_k\sigma_kU_kJ_2V_k^T \\ \p{\phi}{Q} &= 2(BD-F) \\ }$$

There are other ways (besides the SVD) to handle the Kronecker term, but the formulas are longer/messier.


Update

Here's one way to deal with the Kronecker terms which doesn't require the SVD.

Given matrices of the following dimensions and a cost function $$\eqalign{ &I_p\in\R{p\times p} \qquad X\in\R{m\times n} \qquad Y\in\R{pm\times pn} \\ &\psi = Y:\left(I_p\otimes X\right) \\ }$$ To take advantage of the block-diagonal structure of the RHS, define block-matrix analogs of the standard $\{e_k\}$ basis vectors (i.e. columns of the $I_p$ identity matrix) $$\eqalign{ E_k &= (e_k\otimes I_m) &\in\R{pm\times m} \\ F_k &= (e_k\otimes I_n) &\in\R{pn\times n} \\ }$$ and note that $$\eqalign{ E_j^T(I_p\otimes X)F_k &= \left(e_j^T\otimes I_m\right) \left(I_p\otimes X\right) \left(e_k\otimes I_n\right) \\ &= \left(e_j^Te_k\right)X \\ &= \delta_{jk}X \\ }$$ Evaluate the cost function using block-wise summation and calculate its gradient. $$\eqalign{ \psi &= \sum_{j=1}^p\sum_{k=1}^p\;E_j^TYF_k:E_j^T(I_p\otimes X)F_k \\ &= \sum_{j=1}^p\sum_{k=1}^p\;E_j^TYF_k:\delta_{jk}X \\ &= \sum_{k=1}^p\;E_k^TYF_k:X \\ d\psi &= \sum_{k=1}^p\;E_k^TYF_k:dX \\ \p{\psi}{X} &= \sum_{k=1}^p\;E_k^T Y F_k \\ &= \sum_{k=1}^p\;\left(e_k^T\otimes I_m\right)Y\big(e_k\otimes I_n\big) \\ }$$