Square error function in neural network is defined as square of (target-output). This number is always positive, because of square.
On the other side, derivatives of most common functions like Sigmoid or Relu are 0 or more, they are not negative.
My question is. How can we update weights in negative direction (if needed) using Sigmoid or Relu with square error function, if you can not change the value of weights lower to zero or less than zero, assuming these are randomly set up above zero.
Consider a toy loss function: $$L(w)=(\sigma(wx)-y)^2$$
Its derivative is: $$L'(w)=2(\sigma(wx)-y)\cdot\sigma'(wx)\cdot x$$
Although $\sigma'$ is nonnegative, the other terms $(\sigma(wx)-y)$ and/or $x$ could have any sign. So if you update in the direction of $-L'(w)$, you might go in any direction.