Paradox of square error function and derivates in neural networks

100 Views Asked by Bumbble Comm At 22 Feb 2026 - 9:29

Square error function in neural network is defined as square of (target-output). This number is always positive, because of square.

On the other side, derivatives of most common functions like Sigmoid or Relu are 0 or more, they are not negative.

My question is. How can we update weights in negative direction (if needed) using Sigmoid or Relu with square error function, if you can not change the value of weights lower to zero or less than zero, assuming these are randomly set up above zero.

Original Q&A

There are 1 best solutions below

Bumbble Comm On 08 Mar 2018 - 8:05 BEST ANSWER

Consider a toy loss function: $$L(w)=(\sigma(wx)-y)^2$$

Its derivative is: $$L'(w)=2(\sigma(wx)-y)\cdot\sigma'(wx)\cdot x$$

Although $\sigma'$ is nonnegative, the other terms $(\sigma(wx)-y)$ and/or $x$ could have any sign. So if you update in the direction of $-L'(w)$, you might go in any direction.

Paradox of square error function and derivates in neural networks

There are 1 best solutions below

Related Questions in DERIVATIVES

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions