Jacobian with respect to a matrix input

112 Views Asked by At

Im following the link below:

http://cedric.cnam.fr/vertigo/Cours/ml2/tpDeepLearning1.html

In my attempt to code a neural network.

I came across the following notation:

$\frac{\partial \mathcal{L}}{\partial \mathbf{W}}=\frac{1}{N}\left(\hat{\mathbf{Y}}-\mathbf{Y}^{*}\right) \mathbf{X}^{T}=\frac{1}{N} \mathbf{X}^{T} \mathbf{\Delta}^{\mathbf{y}}$

  • Where $W$ is a marix...

I suppose $\frac{\partial \mathcal{L}}{\partial \mathbf{W}}$ is a Jacobian.

  1. how can we calculate the Jacobian of a function whose input is a matrix? For example: $S:W \in R^{d\times (n+1)} \to WX \in R^d$ what is $J_S(W)$ ?

  2. If we can find such an expression does the chain rule hold ? For instance can we say that $J_{H_y\cdot\hat{y} \cdot S}(W)$ =$J_{H_y}(\hat{y}\cdot S(W))J_{\hat{y}}( S(W))J_{S}(W)$?

In my particular case:

$\hat{y}:s=(s_1,...,s_{d}) \in R^{1\times d} \to (\frac{e^{s_1}}{\sum\limits_{j=1}^{d} e^{s_j}} ,\frac{e^{s_2}}{\sum\limits_{j=1}^{d} e^{s_j}} ,...,\frac{e^{s_{d}}}{\sum\limits_{j=1}^{d} e^{s_j}}) \in R^{ d} $ $H_y:q \in R^{d} \to -\sum\limits_{c=1}^{d} y_c \log(q_c) \in R$

Thanks.