Taking the gradient of $f(w) = \sum^n_{i=1}\log(e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i))}$ w.r.t. $w$

46 Views Asked by At

My work:

First, we experiment by taking the derivative w.r.t. $w_1$:

$\frac{\partial f(w)}{\partial w_1} = \frac{\partial }{\partial w_1}\sum^n_{i=1}\log(e^{(w^Tx_i - y_i))}+e^{(y_i - w^Tx_i)}) = \sum^n_{i=1}\frac{\partial }{\partial w_1}\log(e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}) = \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_1e^{(w^Tx_i-y_i)}-x_1e^{(y_i-w^Tx^i)}]$

The last equality, I got from the chain rule and the identity: $$\frac{d}{dw}\ln w = \frac{1}{w}$$.

Therefore, $\nabla f(w) = \begin{align} y &= \begin{bmatrix} \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_1e^{(w^Tx_i-y_i)}-x_1e^{(y_i-w^Tx^i)}] \\ \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_2e^{(w^Tx_i-y_i)}-x_2e^{(y_i-w^Tx^i)}] \\ \vdots \\ \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_ne^{(w^Tx_i-y_i)}-x_ne^{(y_i-w^Tx^i)}] \end{bmatrix} \end{align}$

However, I have a bitter feeling that it shouldn't be this complicated. Can someone please verify? Thank you!

1

There are 1 best solutions below

0
On

You forgot the summation sign. There are $n$ data points, $n$ need not be the length of $w$.

$$f(w) = \sum_{i=1}^n \ln (2\cosh (w^Tx_i - y_i )) = n \ln 2 + \sum_{i=1}^n \ln (\cosh(w^Tx_i-y_i))$$

$$\frac{\partial f}{\partial w_j}=\sum_{i=1}^n \frac{x_{ij}\sinh(w^Tx_i-y_i)}{\cosh(w^Tx_i-y_i)}=\sum_{i=1}^n \frac{x_{ij}\sinh(w^Tx_i-y_i)}{\cosh(w^Tx_i-y_i)}=\sum_{i=1}^n x_{ij}\tanh(w^Tx_i-y_i)$$