Let
- $x$: be a line vector ligne (fixed) of size $(1,784)$.
- $W$: a matrix of size $(784,10)$.
- $b$ a line vector $(1,10)$.
- $\hat{s}:W \in R^{784 \times 10} \to xW+b$.
- $\hat{y}:s=(s_1,...,s_{10}) \in R^{1\times 10} \to (\frac{e^{s_1}}{\sum\limits_{j=1}^{10} e^{s_j}} ,\frac{e^{s_2}}{\sum\limits_{j=1}^{10} e^{s_j}} ,...,\frac{e^{s_{10}}}{\sum\limits_{j=1}^{10} e^{s_j}}) R^{1 \times 10} $
- $H:(p,q)\in R^{1 \times 10} \times R^{1 \times 10} \to -\sum\limits_{c=1}^{10} p_c \log(q_c) $
- $y_{c^*}=(0,...,0,1,0,...,0) =\delta_{lc^*} \in R^{1 \times 10} $ a line vector (fixed) whose all components are null except the $c^*$ one
- $H_{c^*}:q\in R^{1 \times 10} \to - \log(q_{c^*}) \in R $
We are interested in $\mathcal{L}:(W,b) \to H_{c^*} \circ\hat{y} \circ \hat{s} (W,b)$.
Could you give me an explained calculus of $\dfrac{\partial \mathcal{L}}{\partial W}(W,b)$ et $\dfrac{\partial \mathcal{L}}{\partial b}(W,b)$ ?
Thanks in advance !
Let's begin by the partial derivative with respect to the variable $W$. From the chain rule we look for: $\partial_W \mathcal{L}{(W, b)} = \partial_y H_{c*}({\hat{y} (\hat{s} (W, b))})\circ \partial_s \hat{y}{(\hat{s}(W, b))} \circ \partial_W \hat{s}{(W, b)}$. So we have to compute three Jacobian: $\partial_y H_{c*}(y)$, $\partial_s \hat{y}(s)$ and $\partial_W \hat{s}{(W, b)}$.
For example we know from calculus that:
But what about the others Jacobian ? The problem is indeed harder to handle that what i imaginated!
Since the $\hat{s}$ function take a matrix in entry i am not sure that we can compute a matrix jacobian (or we have to decide before that the matrix in entry is in fact a vector and choose a convention to put the first vector inline...)
We have to be careful if we make left or right multiplication since your vectors are line vector (and not column vector)
From the two question before i am not so sure if we can compute the partial jacobian of $\mathcal{L}$ by simply multipling the 3 jacobian previous matrix as we do in simpler problem!!