Back-propagation of biases

47 Views Asked by user147263 At 28 Mar 2026 - 12:48

I am learning about back-prop right now and in the course I am taking, the iterative back-propagation algorithm is given by:

$$\delta^L=\hat{y}^{(i)}-y^{(i)}$$ $$\delta^l=(\Theta ^l)^T\delta^{(l+1)}\circ sigmoid'(z^{l})$$ $$\Delta^l=\sum^{m}_{i=1}(\delta^{(l+1)})^{(i)}$$ (assuming vanilla gradient descent)

However I was never clear on how back-propagation was done wrt the bias nodes. Generally, I have considered (which may well be wrong) the bias units as an extra node that is not interconnected behind it. Keeping the "activation" at 1, and allowing the weights to control the output of the bias node is how I assumed that the network would control the bias. When working out $\delta$ in a non fully-interconnected network, there is a dimension mismatch arising from the mismatch of $s_{l+1}$ and $\Theta^T$.

I am assuming that it is better to just add the bias of the layer to the activations, but I was unable to find formulas on how to calculate the gradient with this additional term (as it is not dependent on the weights with this interpretation).

If someone could outline how to revise the biases within the relative framework described above, that would be great. Thanks.

Original Q&A

Back-propagation of biases

Related Questions in LINEAR-ALGEBRA

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions