Derivation of partial derivative of cost function with respect to weights in backpropagation algorithm

392 Views Asked by Bumbble Comm At 11 May 2026 - 9:04

I am studying Machine Learning from Andrew Ng's Machine Learning course on coursera. I am stuck at understanding math behind back propagation.

Here is an image of backpropagation algorithm from his course. I am able to understand the derivation of $\delta$ terms from his course notes but the derivation of $\Delta^{(l)}=\Delta^{(l)}+\delta^{(l+1)}(a^{(l)})^{T}$ is not given.

My questions:

What is the meaning of $\Delta^{(l)}$ and how is $\Delta^{(l)}=\Delta^{(l)}+\delta^{(l+1)}(a^{(l)})^{T}$

2. What is the meaning of $\require{enclose}\enclose{horizontalstrike}{D_{i,j}^{(l)}}$ and how is it equal to $\require{enclose} \enclose{horizontalstrike}{D_{i,j}^{(l)}:=\dfrac{1}{m}(\Delta_{i,j}^{(l)}+\lambda\Theta_{i,j}^{(l)})}$ if $\require{enclose} \enclose{horizontalstrike}{j\ne0}$ and $\require{enclose} \enclose{horizontalstrike}{D_{i,j}^{(l)}:=\dfrac{1}{m}(\Delta_{i,j}^{(l)}}$ if $\require{enclose} \enclose{horizontalstrike}{j=0}$

~~3. How is $\require{enclose} \enclose{horizontalstrike}{D_{i,j}^{(l)}=\dfrac{\partial J(\Theta)}{\partial D_{i,j}^{(l)}}}$.~~

EDIT: After referring to this answer on stats.stackexchange.com I now understand that $D_{i,j}^{(l)}$ is the averaged error of weight $\Theta_{i,j}^{(l)}$ over all the training set. So the only part left which I do not understand is why are we adding $\delta^{(l+1)}(a^{(l)})^{T}$ to $\Delta^{(l)}$?

Please give answers with mathematical derivations.

Original Q&A

Derivation of partial derivative of cost function with respect to weights in backpropagation algorithm

Related Questions in DERIVATIVES

Related Questions in PARTIAL-DERIVATIVE

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions