Gradient and Hessian for use in Newton's Method with regularization term

1.2k Views Asked by At

I am trying to implement newton's method on a strictly convex n-dimensional set. with the following objective function:

$$F(x)\equiv \|g - Hx\|_2^2 + \frac 1 2 \lambda \|x\|_2^2 $$

Where $g= Hx$ is my forward model and $\lambda = 1 e -5$

$g\equiv M$ dimensional input vector

$H\equiv M\times N$ Translation Matrix

$x\equiv N$ dimensional output vector

Newton's Method is defined as:

$$x(k+1) = x(k) - (\nabla_x^2F(x))^{-1} \nabla_xF(x) $$ or $$x(k+1) = x(k) - (Hess(F(x)))^{-1} Gradient(F(x)) $$

I am having trouble figuring out the Gradient and Hessian terms of my objective function. I know that the following is true for a similar equation.

If $$F(x) \equiv \|g - Hx\|_2^2 $$

Then

$$ \nabla_xF(x) = 2H^THx - 2H^Tg$$ $$\nabla_x^2F(x) = 2H^TH$$ I haven't been able to figure out how the $\lambda$ term affects the gradient and hessian.

2

There are 2 best solutions below

2
On BEST ANSWER

Rewrite your objective function with vectors and matrices, i.e., \begin{align} F(x)&=\left\|g-Hx\right\|_2^2+\frac{\lambda}{2}\left\|x\right\|_2^2\\ &=\left(g-Hx\right)^{\top}\left(g-Hx\right)+\frac{\lambda}{2}x^{\top}x\\ &=x^{\top}\left(H^{\top}H+\frac{\lambda}{2}I\right)x-x^{\top}H^{\top}g-g^{\top}Hx+g^{\top}g. \end{align} Therefore, the gradient reads $$ \nabla_xF(x)=\left(2H^{\top}H+\lambda I\right)x-2H^{\top}g, $$ while the Hessian becomes $$ \nabla_x^2F(x)=2H^{\top}H+\lambda I. $$

You may see that $\lambda$ serves as a regularity term here: if $H^{\top}H$ is ill-conditioned, $H^{\top}H+\left(\lambda/2\right)I$ would help figure this out somehow. Of course, this is just at the numerical aspect, and further pre-conditionings are necessary as you go deep into it. Personally, the additional $\lambda$ term has more practical meanings than numerical ones, as it helps prevent the unfriendly over-fittings.

0
On

The new question has one extra term and you should just include their derivative and sum to what you already have.

\begin{align} \nabla F(x) &= 2H^THx-2H^Tg + \nabla \left( \frac12 \lambda \|x\|^2 \right)\\ &=2H^THx-2H^Tg +\lambda x\\ \end{align}

\begin{align} \nabla^2 F(x) &= 2H^TH + \nabla(\lambda x) \\ &=2 H^TH+ \lambda I \end{align}