Solving for gradient of Frobenius norm term

2.5k Views Asked by At

Let's first define a couple of variables: $A,B,C \in \mathbb{R}^{m \times n}, D \in \mathbb{R}^{n \times n}$, and $\mu$ is a scalar.

Say I have an ADMM sub-problem that looks like this:

$\arg \min_{B} \frac{\mu}{2} \|A - BD + C\|^2_F$

In this instance, let's assume $A, C \text{ and } D$ are fixed, and we want to find $B$. As I understand it, one would need to compute:

$0 = \frac{\delta}{\delta B} [\frac{\mu}{2} \|A - BD + C\|^2_F]$

However, I am not sure how to compute this gradient given D is a matrix, not a vector. Can anyone help?

2

There are 2 best solutions below

2
On BEST ANSWER

Taiben is on the right track. You will also benefit from two additional facts:

  • $\|X\|_F^2=\langle X, X\rangle = \mathop{\textrm{Tr}} X^TX$ and
  • $\mathop{\textrm{Tr}} AB=\mathop{\textrm{Tr}} BA$ whenever both products are well-posed.

One trick I use in many cases a variational approach. I replace the variable, in this case $B$, with $B+\delta B$. Then I multiply through and isolate a term of the form $\langle P, \delta B \rangle$, where $P$ does not depend on $\delta B$ (but it can depend on $B$). If this is the only linear occurrence of $\delta B$, then the gradient is $P$. Any quadratic or higher-order terms involving $\delta B$ can be ignored. This won't work for general nonlinear matrix functions but for this one it's a snap.

In this case, we have $$\begin{aligned}\tfrac{\mu}{2}\|A+C-(B+\delta B)D\|_F^2&=\tfrac{\mu}{2}\langle A+C-(B+\delta B)D,A+C-(B+\delta B)D\rangle\\ &=\tfrac{\mu}{2}\left(\langle A+C-BD,A+C-BD\rangle + \langle \delta B \cdot D,\delta B \cdot D\rangle\right. \\&\qquad - \left.\langle A+C-BD, \delta B \cdot D \rangle - \langle \delta B \cdot D, A+C-BD \rangle\right) \\ &=\tfrac{\mu}{2}\left(\|A+C-BD\|_F^2 + \|\delta B \cdot D\|_F^2 \right) - \mu \langle A+C-BD, \delta B \cdot D \rangle \end{aligned}$$ So the key term is $$\begin{aligned}- \mu \langle A+C-BD, \delta B \cdot D \rangle &= -\mu\mathop{\textrm{Tr}}((A+C-BD)^T\delta B\cdot D) \\ &= -\mu\mathop{\textrm{Tr}}(D(A+C-BD)^T\delta B) \\ &= -\mu\langle (A+C-BD)D^T,\delta B\rangle \end{aligned}$$ The gradient is $\mu(BD-A-C)D^T$.

0
On

The Frobenius norm gives a trace of the product of matrices. You can then use the chain rule and the following formula: $$ \frac{\partial{\mathrm{tr}(BD)}}{\partial{B}} = D $$