I'm trying to learn gradient descent for Logistic Regression. So, this is my logistic loss function,
$$argmin\;\;\sum_{i=1}^nlog(1+exp^{-(y_iw^T.x_I)})$$
I've calculated the partial derivative of $\frac{\partial L}{\partial W}$ for the above function to use in gradient descent update step. I just wanted to know if it's correct?
$$\rightarrow \frac{\partial}{\partial w}=\;\;\sum_{i=1}^nlog(1+exp^{-(y_iw^T.x_I)})$$ $$\rightarrow \frac{\partial}{\partial w}=\;\;\sum_{i=1}^n \frac{\frac{\partial}{\partial w} 1 + exp^{-(y_iw^Tx_i)}}{1 + exp^{-(y_iw^Tx_i)}}$$ $$\rightarrow \frac{\partial}{\partial w}=\;\;\sum_{i=1}^n \frac{0 + exp^{-(y_iw^Tx_i)}(-x_iy_i)}{1 + exp^{-(y_iw^Tx_i)}}$$ $$\rightarrow\;\;\sum_{i=1}^n\;-exp^{-(y_iw^Tx_i)}(x_iy_i)\;\frac{-exp^{-(y_iw^Tx_i)}(x_iy_i)}{exp^{-(y_iw^Tx_i)}}$$ $$\rightarrow\;\;\sum_{i=1}^n\;-exp^{-(y_iw^Tx_i)}(x_iy_i)\;-(x_iy_i)$$ $$\rightarrow\;\;\sum_{i=1}^n\;-(exp^{-(y_iw^Tx_i)}(x_iy_i)\;+(x_iy_i))$$ $$\frac{\partial}{\partial w}=\;\;\sum_{i=1}^n\;-(x_iy_i)\;(exp^{-(y_iw^Tx_i)}+1)$$
Gradient update:
w_prev = w_prev - learning_rate * $\frac{\partial L}{\partial W}_{w_1}$
$$L(w) = \sum_{i=1}^n \log\left( 1+\exp(-y_iw^Tx_i)\right)$$
\begin{align} \frac{\partial L}{\partial w_j} &= \sum_{i=1}^n \frac{0+\exp(-y_iw^Tx_i)\frac{\partial }{\partial w_j}(-y_iw^Tx_i)}{\left( 1+\exp(-y_iw^Tx_i)\right)} \\ &= \sum_{i=1}^n \frac{\exp(-y_iw^Tx_i)}{\left( 1+\exp(-y_iw^Tx_i)\right)} \cdot \frac{\partial}{\partial w_j}(-y_iw^Tx_i)\\ &=\sum_{i=1}^n \frac{\exp(-y_iw^Tx_i)}{\left( 1+\exp(-y_iw^Tx_i)\right)} \cdot (-y_ix_{ij})\\ &=\sum_{i=1}^n \left( 1+\exp(y_iw^Tx_i)\right)^{-1} \cdot (-y_ix_{ij})\\ \end{align}