As it has been stated elsewhere, the derivative of sigmoid is $\sigma$(x)(1-$\sigma$(x)). So with that being said I just would like for verification that when taking the partial derivative to a sigmoid function that I am correct in my thinking. So lets say we had a function f(x) = $\sigma$($w_1$x + $b_1$) and the goal is to take the partial derivative of f(x) with respect to $w_1$. We have:
- $\frac{\partial f}{\partial w_1}$= $\sigma$($w_1$x + $b_1$)
- Based off the knowledge of what the derivative of sigmoid is, can we rewrite the problem as?$$\\$$ $\sigma$($w_1$x + $b_1$)(1-$\sigma$($w_1$x + $b_1$))$\frac{\partial f}{\partial w_1}($$w_1$x + $b_1$)
- If so, then proceeding on: $\sigma$($w_1$x + $b_1$)(1-$\sigma$($w_1$x + $b_1$))(1*x + 0)
- Thus the final answer is: $\sigma$($w_1$x + $b_1$)(1-$\sigma$($w_1$x + $b_1$))(x)
Is this correct thinking? Thanks!
$ \def\p{\partial}\def\s{\sigma} \def\E{{\cal E}}\def\F{{\cal F}}\def\G{{\cal G}} \def\LR#1{\left(#1\right)} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} $Define the following vector and matrix variables $$\eqalign{ W &= w_1 &&b = b_1 \\ y &= Wx+b &\qiq &dy = dW\,x\\ f &= \s(y) &\qiq &F = \Diag f \\ }$$ Then calculate the differential and gradient of $f$ $$\eqalign{ df &= \LR{F-F^2} dy \\ &= \LR{F-F^2} dW\,x \\ &= \LR{F-F^2}\star x:dW \\ \grad fW &= \LR{F-F^2}\star x \\ }$$ where $(\star)$ and $(:)$ denote the dyadic and Frobenius products, i.e. $$\eqalign{ &A:B \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ &\E = F\star x \qiq \E_{ijk} = F_{ij}x_{k} \\ }$$ Note that $\LR{\grad fW}$ is a vector-by-matrix gradient, and therefore a $\,3^{rd}$ order tensor.
Perhaps it's better to write the gradient in component notation $$\eqalign{ \grad{f_i}{W_{jk}} &= \LR{F_{ij}-F_{ij}^2} x_k \\ }$$