I'm trying to build a gradient descent model using newtons method and require the Hessian of:
$$f(w) = \frac{1}{2}\|w\|^2 + C \sum_{i=1}^n\log(1 + \exp(-y_iw^Tx_i))$$
I believe I've found the $j^\text{th}$ partial to be:
$$\frac{\partial f(w)}{\partial w_j} = w_j + C \sum_{i=1}^n -x_{ij}y_i \frac{\exp(-y_iw^Tx_i)}{1 + \exp(-y_iw^Tx_i)}$$
And so the gradient would just be the vector of all these from $w_1,\ldots, w_d$. To derive the hessian I figured I would have to find $\frac{\partial^2 f(w)}{\partial w_j^2}$ and $\frac{\partial^2 f(w)}{\partial w_j \, \partial w_k}$, so far I have:
Let $$M_i = \exp(-y_iw^Tx_i)$$
$$\frac{\partial^2 f(w)}{\partial w_j^2} = 1 + C\sum_{i=1}^n x_{ij}^2y_i^2 \frac{M_i}{(1+M_i)^2}$$
$$\frac{\partial^2 f(w)}{\partial w_j \, \partial w_k} = C\sum_{i=1}^n x_{ij} x_{ik} y_i^2 \frac{M_i}{(1+M_i)^2}$$
This looks right-ish, but I cant figure out how to pull a matrix out of here. It looks something like $(X^TX + I)y^2M$ where $M$ is some combo of my $M_i$s but I dont think thats correct.