Trouble with taking the derivative for neural network

251 Views Asked by Bumbble Comm At 10 May 2026 - 8:14

I tried taking the derivative for a neural network's sigmoid function below but I am getting a slightly different answer and I'm not sure why. I am trying to follow this blog's derivation: https://selbydavid.com/2018/01/09/neural-network/

I would like to take the derivative of the following with respect to $W_{out}$

$\hat y = \sigma(HW_{out}) $

where $\sigma$ is the sigmoid function $\frac{1}{(1+e^{-x})}$

Note: H is a n x 6 matrix and $W_{out}$ is a 6 x 1 vector. This means that the derivative w.r.t. $W_{out}$ should be a n x 1 vector. $\hat y$ is also a n x 1 vector.

After trying to calculate the derivative $\frac{\partial}{\partial W_{out}} \sigma(HW_{out})$, I ended up with:

$\frac{\partial}{\partial W_{out}} \sigma(HW_{out})$ = $\sigma(HW_{out})(1-\sigma(HW_{out})H$

However, the correct answer should've been:

$\frac{\partial}{\partial W_{out}} \sigma(HW_{out})$ = $H^T\sigma(HW_{out})(1-\sigma(HW_{out})$

I don't really understand where H transposed came from. I would greatly appreciate it if someone could walk me through this step-by-step. If it helps, I can post my hand-written derivation.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 31 Oct 2018 - 11:33

Let $h(W)=HW$. Then $$ \nabla(\sigma\circ h) = (Dh)^T \nabla\sigma = H^T \nabla\sigma $$ by using the chain rule for the gradient.

Trouble with taking the derivative for neural network

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions