I am trying to understand the derivative of the logistic regression loss function described by Dan Jurafsky in his book Speech and Language Processing (Draft for Third Edition Chapter 5.8).
I can follow most of his reasoning. I only have issues with the step in equation 5.41 and how he gets to 5.42:
Rearranging the terms after the derivation of the log
I guess, what I don't understand is how he factors out the derivative statement. Because $\frac{\partial}{\partial w_j}\sigma(w \cdot x + b)$ is not the same as $\frac{\partial}{\partial w_j}1-\sigma(w \cdot x + b)$.
$\frac{\partial }{\partial w_j} 1=0$, so
$\frac{\partial }{\partial w_j} \left(1- \sigma(w \cdot x + b) \right) = -\frac{\partial }{\partial w_j} \left( \sigma(w \cdot x + b) \right)$.
If you distribute the first negative sign on the RHS of 5.4.2, and use the equation above, you’ll see that it’s the same.