How does the rearrangement of the logistic regression derivative work?

66 Views Asked by At

I am trying to understand the derivative of the logistic regression loss function described by Dan Jurafsky in his book Speech and Language Processing (Draft for Third Edition Chapter 5.8).

I can follow most of his reasoning. I only have issues with the step in equation 5.41 and how he gets to 5.42:

Rearranging the terms after the derivation of the log

I guess, what I don't understand is how he factors out the derivative statement. Because $\frac{\partial}{\partial w_j}\sigma(w \cdot x + b)$ is not the same as $\frac{\partial}{\partial w_j}1-\sigma(w \cdot x + b)$.

1

There are 1 best solutions below

0
On BEST ANSWER

$\frac{\partial }{\partial w_j} 1=0$, so

$\frac{\partial }{\partial w_j} \left(1- \sigma(w \cdot x + b) \right) = -\frac{\partial }{\partial w_j} \left( \sigma(w \cdot x + b) \right)$.

If you distribute the first negative sign on the RHS of 5.4.2, and use the equation above, you’ll see that it’s the same.