Derivative of vector functions

96 Views Asked by At

i am struggling the derivation of the function on f). Does anyone know how to solve that?

Excercise here

2

There are 2 best solutions below

0
On

Ultimately, this question comes down to a long and complicated application of the chain rule. $$ \frac{\partial}{\partial W_{ij}}\log(1 + \exp(-y\mathbf w^T \hat \sigma (W \mathbf x + \mathbf b))) = \\ (1 + \exp(-y\mathbf w^T \hat \sigma (W \mathbf x + \mathbf b)))^{-1} \frac{\partial}{\partial W_{ij}}(1 + \exp(-y\mathbf w^T \hat \sigma (W \mathbf x + \mathbf b))), $$

$$ \frac{\partial}{\partial W_{ij}}(1 + \exp(-y\mathbf w^T \hat \sigma (W \mathbf x + \mathbf b))) = \\ \exp(-y\mathbf w^T \hat \sigma (W \mathbf x + \mathbf b)) \frac{\partial}{\partial W_{ij}}(-y\mathbf w^T \hat \sigma (W \mathbf x + \mathbf b)), $$

$$ \frac{\partial}{\partial W_{ij}}(-y\mathbf w^T \hat \sigma (W \mathbf x + \mathbf b)) = -y\mathbf w^T \hat \sigma' (W \mathbf x + \mathbf b)\frac{\partial}{\partial W_{ij}}(W \mathbf x + \mathbf b),\\ \frac{\partial}{\partial W_{ij}}(W \mathbf x + \mathbf b) = x_j. $$ Here, $\hat \sigma'$ dentoes entrywise application of $\sigma'$, the derivative of $\sigma$. $x_j$ denots the $j$th entry of $\mathbf x$.

0
On

Let's use a variable naming convention where an uppercase latin letter is a matrix, lowercase latin is a vector, and a lowercase greek is a scalar.

Denote the derivative of the scalar function $\sigma(\zeta)$ as $$\eqalign{ \sigma' = \frac{d\sigma}{d\zeta} \\ }$$ When these scalar functions are applied elementwise on a vector $z$, they produce vector results $$s=\sigma(z),\qquad s'=\sigma'(z)$$ In such situations, it's more convenient to work with the differential quantity $$\eqalign{ ds &= s'\odot dz \\ }$$ The $\odot$ symbol represents the elementwise/Hadamard product, but this can be eliminated in favor of multiplication by a diagonal matrix
$$\eqalign{ ds &= {\rm Diag}(s')\,dz \;=\; S'\,dz \\ }$$ Define some new variables in accordance with our naming convention. $$\eqalign{ \gamma &= y \\ z &= Wx+b \quad&\implies dz = dW\,x \\ s &= \sigma(z) \\ s' &= \sigma'(z) &\implies ds = S'\,dz \\ \beta &= -\gamma{\rm w}^Ts &\implies d\beta = -\gamma{\rm w}^Tds \\ \alpha &= e^\beta &\implies d\alpha = \alpha\,d\beta \\ }$$ Write the function in terms of these new variables and calculate the differential.
Several changes of variables leads ultimately to the gradient. $$\eqalign{ \lambda &= \log(1+\alpha) \\ d\lambda &= e^{-\lambda}\,d\alpha \\ &= e^{-\lambda}\alpha\,d\beta \\ &= -e^{-\lambda}\alpha\gamma\;{\rm w}^Tds \\ &= -e^{-\lambda}\alpha\gamma\;{\rm w}^TS'\,dz \\ &= -e^{-\lambda}\alpha\gamma\;{\rm w}^TS'\,dW\,x \\ &= -e^{-\lambda}\alpha\gamma\; {\rm Trace}\big({\rm w}^TS'\,dW\,x\big) \\ &= -e^{-\lambda}\alpha\gamma\; {\rm Trace}\big(x{\rm w}^TS'\,dW\big) \\ \frac{\partial\lambda}{\partial W} &= -e^{-\lambda}\alpha\gamma\;\big(x{\rm w}^TS'\big)^T \\ &= -e^{-\lambda}\alpha\gamma\;S'{\rm w}x^T \\ }$$ Or if you prefer a component equation $$\eqalign{ \frac{\partial\lambda}{\partial W_{ij}} &= -e^{-\lambda}\alpha\gamma\;\sum_{k=1}^d {S'}_{ik}{\rm w}_kx_j \\ &= -e^{-\lambda}\alpha\gamma\;\sigma'(z_i){\rm w}_ix_j \\ &= -e^{-\lambda}\alpha\gamma\; \sigma'\!\left(\sum_{k=1}^dW_{ik}x_k+b_i\right){\rm w}_ix_j \\ }$$ Looks kind of unwieldy compared to the matrix result.

NB: From the preceding exercises, it looks like $\sigma$ is the logistic function whose derivative is $\,\sigma' = (\sigma - \sigma^2).\,$ This allows the matrix solution to be completely specified $$\eqalign{ \frac{\partial\lambda}{\partial W} = &-e^{-\lambda}\alpha\gamma\;\left(S-S^2\right){\rm w}x^T \\ &{\rm where}\;\; S = {\rm Diag}(s) \\ }$$ but makes the component equation even more of a mess.