I am trying to implement newton's method on a strictly convex n-dimensional set. with the following objective function:
$$F(x)\equiv \|g - Hx\|_2^2 + \frac 1 2 \lambda \|x\|_2^2 $$
Where $g= Hx$ is my forward model and $\lambda = 1 e -5$
$g\equiv M$ dimensional input vector
$H\equiv M\times N$ Translation Matrix
$x\equiv N$ dimensional output vector
Newton's Method is defined as:
$$x(k+1) = x(k) - (\nabla_x^2F(x))^{-1} \nabla_xF(x) $$ or $$x(k+1) = x(k) - (Hess(F(x)))^{-1} Gradient(F(x)) $$
I am having trouble figuring out the Gradient and Hessian terms of my objective function. I know that the following is true for a similar equation.
If $$F(x) \equiv \|g - Hx\|_2^2 $$
Then
$$ \nabla_xF(x) = 2H^THx - 2H^Tg$$ $$\nabla_x^2F(x) = 2H^TH$$ I haven't been able to figure out how the $\lambda$ term affects the gradient and hessian.
Rewrite your objective function with vectors and matrices, i.e., \begin{align} F(x)&=\left\|g-Hx\right\|_2^2+\frac{\lambda}{2}\left\|x\right\|_2^2\\ &=\left(g-Hx\right)^{\top}\left(g-Hx\right)+\frac{\lambda}{2}x^{\top}x\\ &=x^{\top}\left(H^{\top}H+\frac{\lambda}{2}I\right)x-x^{\top}H^{\top}g-g^{\top}Hx+g^{\top}g. \end{align} Therefore, the gradient reads $$ \nabla_xF(x)=\left(2H^{\top}H+\lambda I\right)x-2H^{\top}g, $$ while the Hessian becomes $$ \nabla_x^2F(x)=2H^{\top}H+\lambda I. $$
You may see that $\lambda$ serves as a regularity term here: if $H^{\top}H$ is ill-conditioned, $H^{\top}H+\left(\lambda/2\right)I$ would help figure this out somehow. Of course, this is just at the numerical aspect, and further pre-conditionings are necessary as you go deep into it. Personally, the additional $\lambda$ term has more practical meanings than numerical ones, as it helps prevent the unfriendly over-fittings.