$\frac{d}{d\mathbf{w}}Xsig(X^T\mathbf{w})=Xdiag(sig(X^T\mathbf{w})\odot (1-sig(X^Tw)))X^T$
I want to know the step of getting the above result, thank you. The $sig$ is the element-wise sigmoid function. The convention is denominator layout.
$\frac{d}{d\mathbf{w}}Xsig(X^T\mathbf{w})=Xdiag(sig(X^T\mathbf{w})\odot (1-sig(X^Tw)))X^T$
I want to know the step of getting the above result, thank you. The $sig$ is the element-wise sigmoid function. The convention is denominator layout.
The derivative of the logistic function, $\,s={\rm sig}(y),\,$ considered as an ordinary (scalar) function is $$\frac{ds}{dy} = (s-s^2) \quad\implies\quad ds = (s-s^2)\,dy$$ Applying the function element-wise to a vector argument $\;(y=X^Tw)\;$ yields vector results $$\eqalign{ s &= {\rm sig}(y) \\ ds &= (s-s\odot s)\odot dy \\ }$$ The elementwise/Hadamard products can be replaced by multiplication with a diagonal matrix, i.e. $$\eqalign{ S &= {\rm Diag}(s) \\ ds &= (S-S^2)\,dy \\ }$$ Let's apply the above to the function in the question and calculate its gradient. $$\eqalign{ f &= Xs \\ df &= X\,ds \\ &= X(S-S^2)\,dy \\ &= X(S-S^2)X^T\,dw \\ &= XS(I-S)X^T\,dw \\ \frac{\partial f}{\partial w} &= XS(I-S)X^T \\ }$$ So the gradient expression in the question is wrong.