I saw the following result: $$ \dfrac{\mathrm{d}}{\mathrm{d}x} \left( \log\left( \dfrac{1}{1+\mathrm{e}^{-x}} \right) \right) = \dfrac{1}{\mathrm{e}^x+1} $$ What are the intermediary steps for obtaining this result?
Obtaining derivative of log of sigmoid function
14.4k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 6 best solutions below
On
You just have to use the Chain Rule.
$\alpha = 1+e^{-x}$
$\beta = \alpha^{-1}$
$\frac{d\,log(\beta)}{d\,x} = \frac{d\,log(\beta)}{d\,\beta}\,\frac{d\,\beta}{d\,x} = \frac{d\,log(\beta)}{d\,\beta}\,\frac{d\,\alpha^{-1}}{d\,\alpha}\,\frac{d\,\alpha}{d\,x} = \left(\frac{1}{\beta}\right)\,\left(-\frac{1}{\alpha^2}\right)\,\left(-e^{-x}\right) = \frac{e^{-x}}{1+e^{-x}} = \boxed{\frac{1}{e^x + 1}}$
You don't have to worry with signs, because everything in there is always strictly positive.
On
Log base could refer different bases for different fields. Notice that log(x) refers to base-2 log for computer science, base-e log for mathematical analysis and base-10 log for logarithm tables.
In most general form, derivative of y = logb(1/(1 + ex)) is in following form:
dy/dx = 1 / (ln(b) . (1 + ex))
Of course, if main function were refered to natural logarithm, then b would equal to e, and derivative would be:
dy/dx = 1 / (ln(e) . (1 + ex))
ln(e) would be 1 based on the logarithm of the base rule.
dy/dx = 1 / ((1 + ex))
Mostly, natural logarithm of sigmoid function is mentioned in neural networks. Activation function is calculated in feedforward step whereas its derivative is calculated in backprogation. And derivative of natural log of sigmoid is easier to calculate than other bases.
On
Bottom line:
$$\frac{d}{dx}log(\frac{1}{1+e^{-x}}) = 1 - \frac{1}{1+e^{-x}} = \frac{1}{e^{x} + 1}$$
For the other part of BCE (Binary Cross Entropy):
$$\frac{d}{dx}log(1 - \frac{1}{1+e^{-x}}) = -\frac{1}{1+e^{-x}}$$
For multivariate case:
Suppose that our features are $x_1, x_2, ..., x_n$ and the weights of the model are $w_0, w_1, w_2, ..., w_n$, where $w_0$ is bias, such that we want to differentiate
$$log(\frac{1}{1+e^{-(w_0 + w_1x_1 + ... + w_nx_n)}})$$
For convenience, we will define a constant feature $x_0 = 1$, then rewrite the same expression as
$$log(\frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}})$$
Then,
$$\frac{\partial}{\partial w_i}log(\frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}}) = x_i\frac{1}{e^{(w_0x_0 + w_1x_1 + ... + w_nx_n)} + 1}$$
When we derive the other part of BCE loss:
$$log(1 - \frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}})$$
Then,
$$i > 0 : \frac{\partial}{\partial w_i}log(1 - \frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}}) = -x_i\frac{1}{1 + e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}}$$
In more details with all the steps:
It is way easier that what it might look at first sight, so try to enjoy the ride...
The Sigmoid function is $$f = \frac{1}{1+e^{-x}}$$
So it's derivative must be (according to the derivation rule for division): $$f' = \frac{e^{-x}}{(1+e^{-x})^2}$$
But this expression can be written as:
$$f' = \frac{e^{-x}}{(1+e^{-x})^2} = \frac{1 + e^{-x} - 1}{(1+e^{-x})^2} = \frac{1 + e^{-x}}{(1+e^{-x})^2} - \frac{1}{(1+e^{-x})^2} = \frac{1}{(1+e^{-x})} - \frac{1}{(1+e^{-x})^2}$$
But notice that
$$\frac{1}{(1+e^{-x})} = f , \frac{1}{(1+e^{-x})^2} = f^2$$
So we actually get
$$f' = f - f^2 = f(1-f)$$
Now applying $log$ on the Sigmoid - let us define:
$$g = log(f)$$
So $$g' = \frac{f'}{f}$$
But we already know that $f' = f(1-f)$, so we get:
$$g' = \frac{f'}{f} = \frac{f(1-f)}{f} = 1-f$$
So for sigmoid function $f$, the derivative of $g = log(f)$ is simply $1-f$:
$$g' = [log(\frac{1}{1+e^{-x}})]' = 1 - \frac{1}{1+e^{-x}}$$
If you want to simplify it even more, you can do this:
$$g' = 1 - \frac{1}{1+e^{-x}} = \frac{1+e^{-x}-1}{1+e^{-x}} = \frac{e^{-x}}{1+e^{-x}} = \frac{1}{e^{x} + 1}$$
Similarly, we can compute the derivative of the other BCE term:
$$h = log(1 - \frac{1}{1+e^{-x}}) = log(1 - f)$$
So, $$h' = [log(1-f)]' = \frac{-f'}{1-f} = \frac{-f(1-f)}{1-f} = -f = -\frac{1}{1+e^{-x}}$$
I didn't include the steps for the partial derivatives, but they are very similar to the above steps.
On
We may also re-arrange the equation and differentiate implicitly with respect to $ \ x \ \ : $ $$ y \ \ = \ \ \ln\left( \ \frac{1}{1 \ + \ e^{-x}} \ \right) \ \ \Rightarrow \ \ e^y \ \ = \ \ \frac{1}{1 \ + \ e^{-x}} \ \ \Rightarrow \ \ e^y · (1 \ + \ e^{-x}) \ \ = \ \ 1 $$ [this creates no difficulties since the denominator in the ratio is never zero]; $$ \frac{d}{dx} \ [ \ e^y · (1 \ + \ e^{-x}) \ ] \ \ = \ \ \frac{d}{dx} \ [ \ 1 \ ] \ \ \Rightarrow \ \ e^y · y' · (1 \ + \ e^{-x}) \ + \ e^y · (- e^{-x}) \ \ = \ \ 0 $$ $$ \Rightarrow \ \ e^y · y' · (1 \ + \ e^{-x}) \ \ = \ \ e^y · e^{-x} \ \ \Rightarrow \ \ y' · (1 \ + \ e^{-x}) \ \ = \ \ e^{-x} $$ [the factor $ \ e^y \ $ may be "divided out", as it is also never zero] $$ \Rightarrow \ \ y' \ \ = \ \ \frac{e^{-x}}{1 \ + \ e^{-x}} \ \ = \ \ \frac{e^{-x}}{1 \ + \ e^{-x}} \ · \ \frac{e^x}{e^x} \ \ = \ \ \frac{1}{e^x \ + \ 1} \ \ . $$
Hint:
First, notice that $$ \begin{align} \dfrac{1}{1+e^{-x}} = \dfrac{\mathrm{e}^{x} \cdot 1}{\mathrm{e}^{x} \cdot 1 + \mathrm{e}^{x} \cdot e^{-x}} = \dfrac{\mathrm{e}^{x}}{\mathrm{e}^{x} + 1} \;. \end{align} $$
Second, notice that $$ \begin{align} \ln\left( \dfrac{\mathrm{e}^{x}}{\mathrm{e}^{x} + 1} \right) = \ln\left( \mathrm{e}^{x}\right) - \ln\left( \mathrm{e}^{x} + 1 \right) = x - \ln\left( \mathrm{e}^{x} + 1 \right) \;. \end{align} $$
So, we have $$ \begin{align} \dfrac{\mathrm{d}}{\mathrm{d}x} \ln\left( \dfrac{1}{1+\mathrm{e}^{-x}} \right) &= \dfrac{\mathrm{d}}{\mathrm{d}x} \left( x - \ln\left( \mathrm{e}^{x} + 1 \right) \right) = \dfrac{\mathrm{d}x}{\mathrm{d}x} - \dfrac{\mathrm{d}\ln\left( \mathrm{e}^{x} + 1 \right)}{\mathrm{d}x} \;. \end{align} $$
Can you go on from here using the chain rule?