Is this matrix differentiation derivation correct?

61 Views Asked by At

I am trying to find derivative of this quantity but in my result the dimensions don't match. Any inputs will be helpful.

$$\frac{\partial}{\partial w}\left(\left\|F(w)-\beta D^{\top} Z\right\|_{F}^{2}\right)$$

$$ \begin{equation} \begin{split} \frac{\partial}{\partial w}\left(\left\|F(w)-\beta D^{\top} Z\right\|_{F}^{2}\right) =& \, \frac{\partial}{\partial w}\left(\left\|F(w)-\beta D^{\top} Z\right\|_{F}^{2}\right)\\ =& \, \frac{\partial}{\partial w}\left(Tr\left[ (F(w)-\beta D^{\top} Z) \, (F(w)-\beta D^{\top} Z)^{\top}\right] \right)\\ =& \, \frac{\partial}{\partial w}\left(Tr\left[ \left(F(w)\,F(w)^{\top}\right)- 2\,F(w)\left( \beta D^{\top} Z\right)^{\top} + \left(\beta D^{\top} Z Z^{\top} D \beta^{\top} \right) \right] \right) \\ =& \, \frac{\partial}{\partial w}Tr\left[ F(w)\,F(w)^{\top}\right] - 2 \, C_1 \frac{\partial}{\partial w} Tr\left[\,F(w)\left( \beta D^{\top} Z\right)^{\top} \right]\\ =& \, \left(\frac{\partial}{\partial w}F(w)\right)\,F(w)^{\top} - 2 \, C_1 \left(\frac{\partial}{\partial w} \,F(w) \right)\left( \beta D^{\top} Z\right)^{\top} \\ =& \, \left(\frac{\partial}{\partial w}F(w)\right)\, \left(F(w) - \left( \beta D^{\top} Z\right)\right)^{\top} \\ =& \, \left(\frac{\partial}{\partial w}F(w)\right)\, \left(F(w)^{T} - Z^{\top}D \beta^{T}\right) \end{split} \end{equation} $$

Is the above derivation correct ? I think dimensionality of gradient doesn't match !!! $$ D_{d \times m}, Z_{d \times l}, F(w)_{n \times l}, w_{d \times n}, \beta_{n \times m} $$

Gradient of "w" should be of same dimension as "w" which is (d X n) but the resultant is coming out to be tensor. Note that "F(w)" has soft-max probabilities for "Z" predicted using "w"

1

There are 1 best solutions below

4
On BEST ANSWER

$\def\p#1#2{\frac{\partial #1}{\partial #2}}$For typing convenience, define the matrix variables $$\eqalign{ M &= \beta D^TZ \\ W &= w &&\big({\rm Uppercase\,for\,matrices}\big) \\ A &= (F-M) \quad&\implies\quad&dA = dF \\ }$$ You haven't told us anything about the function $F(W)$, so I'll assume you don't need help calculating its fourth-order tensor gradient $$\eqalign{ \Gamma_{ijk\ell} &= \p{F_{ij}}{W_{k\ell}} \\ }$$ The double-dot product notation for the trace of matrices $$\eqalign{ A:B &= \sum_{i=1}^m \sum_{j=1}^n A_{ij}B_{ij} \;=\; {\rm Tr}(AB^T) \\ A:A &= \big\|A\big\|_F^2 \\ }$$ is easily extended to higher-order tensors, i.e. $$\eqalign{ Y &= \Gamma:X \quad&\implies\quad Y_{ij} = \sum_{k=1}^p \sum_{\ell=1}^q \Gamma_{ijk\ell}X_{k\ell} \\ Z &= Y:\Gamma \quad&\implies\quad Z_{k\ell} = \sum_{i=1}^m \sum_{j=1}^n Y_{ij}\Gamma_{ijk\ell} \\ }$$ Write the trace using the above notation, then calculate its differential and gradient. $$\eqalign{ \phi &= A:A \\ d\phi &= 2A:dA = 2A:dF = 2A:\Gamma:dW \\ \p{\phi}{W} &= 2A:\Gamma \\ }$$ It's easy to see that the gradient of the trace (with respect to $W$) has the same dimensions as $W$.