Derivative of Binary Cross Entropy is always negative

1.9k Views Asked by At

I'm trying to find derivatives of a back propogation algorithm.

Given a loss function $$L(\hat y_i, y_i) = \sum - y_i log (\hat y_i)$$ where $\hat y_i = \sigma(z)$ and $z = Wx + b$. Find the update of weight $\frac{dL}{dw}$.

$$\frac{dL}{dw} = \frac{dL}{d\hat y_i}\frac{d\hat y_i}{dz}\frac{dz}{dw}$$

Now, the last 2 terms are always positive (given $x>0$), since the derivative of the sigmoid is bounded by $0$ and $1$.

Now for the $\frac{dL}{d\hat y_i}$ part, it seems that it's always negative, and equal to: $$\frac{dL}{d\hat y_i} = -\sum \frac{y_i}{\hat y_i}$$

Now this doesn't make intuitive sense, because what is the point of having a learning rate $\lambda$, if the weight update is always in the same direction?

Given random weight initialisations, $w$ can only get bigger.

$$w_i = w_i - \lambda \frac{dL}{dw}$$