My work:
First, we experiment by taking the derivative w.r.t. $w_1$:
$\frac{\partial f(w)}{\partial w_1} = \frac{\partial }{\partial w_1}\sum^n_{i=1}\log(e^{(w^Tx_i - y_i))}+e^{(y_i - w^Tx_i)}) = \sum^n_{i=1}\frac{\partial }{\partial w_1}\log(e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}) = \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_1e^{(w^Tx_i-y_i)}-x_1e^{(y_i-w^Tx^i)}]$
The last equality, I got from the chain rule and the identity: $$\frac{d}{dw}\ln w = \frac{1}{w}$$.
Therefore, $\nabla f(w) = \begin{align} y &= \begin{bmatrix} \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_1e^{(w^Tx_i-y_i)}-x_1e^{(y_i-w^Tx^i)}] \\ \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_2e^{(w^Tx_i-y_i)}-x_2e^{(y_i-w^Tx^i)}] \\ \vdots \\ \frac{1}{e^{(w^Tx_i - y_i)}+e^{(y_i - w^Tx_i)}}[x_ne^{(w^Tx_i-y_i)}-x_ne^{(y_i-w^Tx^i)}] \end{bmatrix} \end{align}$
However, I have a bitter feeling that it shouldn't be this complicated. Can someone please verify? Thank you!
You forgot the summation sign. There are $n$ data points, $n$ need not be the length of $w$.
$$f(w) = \sum_{i=1}^n \ln (2\cosh (w^Tx_i - y_i )) = n \ln 2 + \sum_{i=1}^n \ln (\cosh(w^Tx_i-y_i))$$
$$\frac{\partial f}{\partial w_j}=\sum_{i=1}^n \frac{x_{ij}\sinh(w^Tx_i-y_i)}{\cosh(w^Tx_i-y_i)}=\sum_{i=1}^n \frac{x_{ij}\sinh(w^Tx_i-y_i)}{\cosh(w^Tx_i-y_i)}=\sum_{i=1}^n x_{ij}\tanh(w^Tx_i-y_i)$$