How to get the gradients with respect to a matrix?

396 Views Asked by At

I found a formula

$$ \mathbb{G} = \frac{\lambda_1}{2} tr(WW^\mathrm{ T }) + \frac{\lambda_2}{2} tr(W{\Omega}^{-1} W^\mathrm{ T })$$

Where $W$ is a $n \times m$ matrix and $\Omega$ is a $m \times m$ matrix.

And the gradient of $\mathbb{G}$ with respect to $W$ is

$$ \frac{\partial{\mathbb{G}}}{\partial{W}}=W(\lambda_1 I_m + \lambda_2 \Omega^{-1})$$

Where $I_m$ is $m \times m $ identity matrix.

How can I get the $ \frac{\partial{\mathbb{G}}}{\partial{W}}$ when $W$ is a matrix?

1

There are 1 best solutions below

1
On BEST ANSWER

As I mentioned in the comment, the Gâteaux-Derivative works pretty well for this.

I will show this for the first part $G(W)=\operatorname{tr}(WW^\top)$.


It is with a direction $δW$ and an increment $ε>0$: \begin{align*} \operatorname{tr}((W+εδW)(W+εδW)^\top &= \operatorname{tr}(WW^\top + εδWW^\top + εWδW^\top + ε^2δWδW^\top) \\ &=\operatorname{tr}(WW^\top) + ε\operatorname{tr}(δWW^\top) + ε\operatorname{tr}(WδW^\top) + ε^2\operatorname{tr}(δWδW^\top) \end{align*}

Hence we have: $$G(W+δW)-G(W) = ε\operatorname{tr}(δWW^\top) + ε\operatorname{tr}(WδW^\top) + ε^2\operatorname{tr}(δWδW^\top)$$

and $$\frac{G(W+δW)-G(W)}{ε} = \operatorname{tr}(δWW^\top) + \operatorname{tr}(WδW^\top) + ε\operatorname{tr}(δWδW^\top)$$

and with $ε→0$ it follows: $$\lim_{ε→0}\frac{G(W+δW)-G(W)}{ε} = \operatorname{tr}(δWW^\top) + \operatorname{tr}(WδW^\top) = 2\operatorname{tr}(WδW^\top) =: D(G,δW)$$

Now we know the directional derivative $D(W,δW)$ with respect to an arbitrary matrix $δW$.

The derivative is defined as: $$\frac{∂G}{∂W} = \pmatrix{\frac{∂G}{∂W_{11}} & \frac{∂G}{∂W_{12}} & … & \frac{∂G}{∂W_{1n}}\\\frac{∂G}{∂W_{21}} & \frac{∂G}{∂W_{22}} & … & \frac{∂G}{∂W_{2n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{∂G}{∂W_{m1}} & \frac{∂G}{∂W_{m2}} & … & \frac{∂G}{∂W_{mn}} }$$

with $\frac{∂G}{∂W_{ij}}:=D(G,E_{ij})$, and $E_{ij}$ defined by $[E_{ij}]_{ij}=1$, and 0 everywhere else (canonical basis). This follows the same construction we already know for the gradient of a function. In the same way as $\frac{∂f}{∂x_1}$ is the derivative of $f$ in direction $e_1$, $\frac{∂G}{∂W_{34}}$ is the derivative of $G$ in direction $E_{34}$.

If we plug in $δW=E_{ij}$ we get: $$W^\top E_{ij} = (0|…|0|\underbrace{W^\top_i}_{j}|0|…|0),$$ that has the $i$-th column of $W^\top$ in the column $j$.

Hence, we get: $$D(W,E_{ij})=2\operatorname{tr}(W^\top E_{ij})=2[W^\top]_{ji}=2w_{ij}.$$

Putting everything together yields: $$\frac{∂G}{∂W} = 2W = 2WI_m$$

Can you do the rest?