About the well definition of the gradient descent method

156 Views Asked by At

Consider this type of gradient method:

Let $J\colon \mathbb R^p\to \mathbb R\in C^1$ being an elliptic function, i.e.

$$\exists \alpha>0,\quad \forall x,y\in \mathbb R^p,\quad \langle \nabla J(x)-\nabla J(y),x-y\rangle\geqslant \alpha\vert\vert x-y\vert\vert ^2.$$

I have then seen defined a gradient method as:

$$\begin{cases} u_0\in \mathbb R^p \\ u_{n+1}=u_n-\rho(u_n)\nabla J(u_n)\end{cases}$$

where $\rho(u_n)$ satisfies

$$\frac{\partial}{\partial \rho}J(u_n-\rho \nabla J(u_n))=0.$$

I don't understand why $\rho(u_n)$ would be well-defined in this situation.

I have to show that the real function $\rho\mapsto J(u_n-\rho \nabla J(u_n))$ admit a unique minimum on $\mathbb R$, but I don't get why this is true.

1

There are 1 best solutions below

0
On BEST ANSWER

With your hypothesis, we can show that $J$ is a convex and coercive function.

Indeed :

$$J(v)-J(u)=\int_0^1 \langle \nabla J(u+t(v-u))|v-u\rangle dt$$

Now we write : $\langle \nabla J(u+t(v-u))|v-u\rangle=\langle \nabla J(u)|v-u\rangle+\langle \nabla J(t(v-u))|v-u\rangle$. It allows us to have :

$$J(v)-J(u) = \langle \nabla J(u)|v-u\rangle +\int_0^1 \frac 1 t\langle \nabla J(t(v-u))|t(v-u)\rangle dt$$

Now, since $J$ is elliptic, we have :

$$J(v)-J(u) \geq \langle \nabla J(u)|v-u\rangle + \frac \alpha 2 \|u-v\|^2$$

Taking $v \neq u \implies J(v)-J(u) > \langle \nabla J(u)|v-u\rangle$. Which is one of the characterization of strict convexity.

Taking $u=0 \implies |J(v)| \geq |J(0)| -\| \nabla J(0)\|\|v\|+\frac \alpha 2\|v\|^2 \underset{\|v\| \to \infty}\to \infty $

Since $J$ is strictly convex on $\mathbb{R}^p$, its restriction on a line is strictly convex as well. Then, $J(u_n-\rho \nabla J(u_n))$ is convex.

So if a minimum exists, it is unique. But since $J$ is coercive, that minimum exists.