We have $$f(w) = \ln(1+\exp(-w^Tx))$$
Here, $w$ and $x$ are $d \times 1$
What's the hessian of this function?
I got gradient as $$\frac{-x\exp(-w^Tx)}{1+\exp(-w^Tx)}$$
then hessian as $$\frac{(1+\exp(-w^Tx))(x^2 \exp(-w^Tx)) - (-x\exp(-w^Tx))^2}{(1+\exp(-w^Tx))^2}$$
Is this correct? Why does it seem like the Hessian is a scalar? Also, I'm confused about the dimensions because how do we do $x^2$ if $x$ is $d \times 1$? Or should this be interpreted as $x^Tx$?
It's helpful to think about this in terms of the matrix representation of the problem. Your gradient looks fine, but think about what this should like like as a column vector. If we take $$ g(w) \;\; =\;\; -\frac{\exp(-w^Tx)}{1+\exp(-w^Tx)}, $$
Then the gradient $$ \text{grad} f \;\; =\;\; \left [ \begin{array}{c} g(w)x_1 \\ \vdots \\ g(w) x_d \\ \end{array} \right ]. $$
When computing the Hessian, you construct a $d\times d$ matrix where in each entry you get
$$ \text{Hess}f \;\; =\;\; \left [ \begin{array}{cccc} \frac{\partial}{\partial w_1} g(w)x_1 & \frac{\partial}{\partial w_2} g(w)x_1 & \ldots & \frac{\partial}{\partial w_d} g(w)x_1 \\ \frac{\partial}{\partial w_1} g(w)x_2 & \frac{\partial}{\partial w_2} g(w)x_2 & \ldots & \frac{\partial}{\partial w_d} g(w)x_2 \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial}{\partial w_1} g(w)x_d & \frac{\partial}{\partial w_2} g(w)x_d & \ldots & \frac{\partial}{\partial w_d} g(w)x_d \\ \end{array} \right ]. $$
Hopefully this is enough to get you started.