What is the derivative of binary cross entropy loss w.r.t to input of sigmoid function?

3k Views Asked by At

I want to compute the derivative of binary cross entropy loss w.r.t to the input of the sigmoid function and was wondering if there's a closed form expression? I've seen derivations of binary cross entropy loss with respect to model weights/parameters (derivative of cost function for Logistic Regression) as well as derivations of the sigmoid function w.r.t to its input (Derivative of sigmoid function $\sigma (x) = \frac{1}{1+e^{-x}}$), but nothing that combines the two. I would greatly appreciate any help with this.

There's also a post that computes the derivative of categorical cross entropy loss w.r.t to pre-softmax outputs (Derivative of Softmax loss function). I am looking for something similar in the binary case (perhaps this generalizes to the binary case, but not sure).

1

There are 1 best solutions below

2
On BEST ANSWER

Use properties of logarithms to simplify as much as possible before taking the derivative.

Let $0 \leq p \leq 1$. We want to compute the derivative of the function \begin{align} L(u) &= -p \log(\sigma(u)) - (1-p)\log(1 - \sigma(u)) \\ &= -p\log( \frac{e^u}{1+e^u} ) - (1-p) \log( \frac{1}{1+e^u}) \\ &= -pu +\log(1 + e^u). \end{align}

Look how much $L(u)$ simplified! Sigmoid and binary cross-entropy are a match made in heaven.

It is now easy to take the derivative of $L$: $$ L’(u) = \sigma(u) - p. $$

This formula has a nice interpretation. If the predicted probability $\sigma(u)$ agrees perfectly with the ground truth probability $p$, then the derivative of $L$ is $0$ — suggesting that we do not need to make any change to the value of $u$.