Gradient computation, result verification

41 Views Asked by At

I have a problem with the computation of the gradient of the function $$ L(w) = -\dfrac{1}{N}\sum\limits_{n=1}^N y_{n}\log\left( \sigma(w^{T}x_{n}) \right) + (1 - y_{n})\log\left( 1-{\sigma}(w^{T}x_{n}) \right) $$ where $\sigma$ is a sigmoid function defined by $\sigma(x) = \dfrac{1}{1 + \mathrm{e}^{-x}}$.

With my attempts I end up with the gradient taken with respect to $w$: $$ \triangledown_{w}L(w) = -\dfrac{1}{N}\sum\limits_{n=1}^N x_{n}y_{n} -x_{n}\sigma(w^{T}x_{n}) \;. $$

My process of computation was as follows: derivative of $\log\left( \sigma(w^{T}x_{n}) \right)$, then inner function of $\log$, so it's $\sigma$ and then inner function of $\sigma$ so $w^Tx$. The same, of course, for the second part of the sum.

Can you spot any obvious mistakes? Maybe I am not allowed to treat vectors ($w$ and $x$) like normal variables?

Thanks in advance.

1

There are 1 best solutions below

1
On BEST ANSWER

An easy way to check is to check the components. Recall that $$ \frac{\partial}{\partial x}\sigma(x) = \sigma(x)[1-\sigma(x)] \;\;\;\&\;\;\; \frac{\partial}{\partial w_i} w^Tx_j = x_{ji} $$ Then, the $i$th component of the gradient is: \begin{align} \frac{\partial}{\partial w_i} L(w) &= \frac{-1}{N}\sum_j y_j\frac{\sigma(w^Tx_j)[1-\sigma(w^Tx_j)]x_{ji}}{\sigma(w^Tx_j)} +(1-y_j)\frac{(-1)\sigma(w^Tx_j)[1-\sigma(w^Tx_j)]x_{ji}}{1-\sigma(w^Tx_j)}\\ &= \frac{-1}{N}\sum_j y_j[1-\sigma(w^Tx_j)]x_{ji} - (1-y_j)\sigma(w^Tx_j)x_{ji}\\ &= \frac{-1}{N}\sum_j y_jx_{ji} - \sigma(w^Tx_j)x_{ji} \end{align} Therefore, combining the components into one vector: $$ \nabla L(w) = \frac{-1}{N}\sum_j x_j[y_j - \sigma(w^Tx_j)] $$ as expected.