I am in a machine learning class and am very confused with deriving this partial derivative. I know how to derive the derivative of a sigmoid function, but I do not know how to derive the log base sigmoid with respect to w. I am trying to get the overall derivative with respect to w.
$$[(1 − yi)log(1 − σ(w^T x_i)) + y_i log σ(w^T x_i)]$$
For example here if we are using the chain rule, I do not understand $$y_i log σ(w^T x_i)$$
as this is not taking the log of the sigmoid function but using the sigmoid function as a base. How do I derive this?
The end answer is: $$x_i[-σ(w^t x_i) + y_i]$$
Why do you say this is "log base sigmoid"? What would that even mean? I don't think that is the correct interpretation here. I think it is indeed just the log of the sigmoid function.
You'll need to use the chain rule here. Also keep in mind that $\sigma'(x)=\sigma(x)(1-\sigma(x))$. So we have:
$$ \nabla_w[(1-y_i)\log(1-\sigma(w^Tx_i))+y_i\log(\sigma(w^Tx_i))] = (1-y_i)\frac{1}{1-\sigma(w^Tx_i)} \cdot - \sigma'(w^Tx_i) x_i + y_i \frac{1}{\sigma(w^Tx_i)}\sigma'(w^Tx_i)x_i $$
Here I've used the fact that $\frac{d}{dx}\log(x)=\frac{1}{x}$, and $\nabla_w (w^Tx_i)=x_i$. Now if we substitute $\sigma'(w^Tx_i)=\sigma(w^Tx_i)(1-\sigma(w^Tx_i))$ we have: $$-(1-y_i)\sigma(w^Tx_i)x_i+y_i(1-\sigma(w^Tx_i))x_i$$ which simplifies to $$(y_i-\sigma(w^Tx_i))x_i.$$