Derivative of neural network function with respect to weights

872 Views Asked by Bumbble Comm At 27 Mar 2026 - 7:13

I have a very simple function (a neural network actually), whose derivative I want to determine.

Let $\mathbf{x}\in\mathbb{R}^{1\times n}$, $\mathbf{W}=[W_1 \cdots W_i \cdots W_m]\in\mathbb{R}^{n\times m}$, and let $\sigma$ be the sigmoid function ($\sigma(x)=\frac{1}{1+\exp(-x)}$).

Let $\mathbf{z}=\sigma(\mathbf{x}\cdot \mathbf{W})$ (element-wise), and let $\mathbf{y}=\text{softmax}(\mathbf{z})$, so that $y_i=\frac{\exp(z_i)}{\sum_k \exp(z_k)}$ for $i=1,\ldots,m$. What I try to deduce is $\frac{\partial E}{\partial W_{ik}}$, where

$$E=-\sum_{j=1}^m t_j\log y_j$$

and $t_1,\ldots,t_m$ is a probability distribution.

My result is that

$$\frac{\partial E}{\partial W_{ik}}=\frac{\partial E}{\partial z_i}\frac{\partial z_i}{\partial{W_{ik}}}$$

The first term $\frac{\partial E}{\partial z_i}$ can be seen to be $y_i-t_i$ (this should be correct), while the second/last term should clearly be $\sigma'(\mathbf{x}\cdot W_i)\cdot x_k=\sigma(\mathbf{x}\cdot W_i)(1-\sigma(\mathbf{x}\cdot W_i))x_k$.

However, when I compare my error derivative to a numerical gradient calculator, my result is always off. Does anyone see a mistake?

Original Q&A

Derivative of neural network function with respect to weights

Related Questions in PARTIAL-DERIVATIVE

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions