Fixed Point Iteration for the Gradient of Smooth Convex Function

488 Views Asked by At

Given a function a convex and smooth function $ f \left( x \right) : \mathbb{R}^{n} \to \mathbb{R} $ with its gradient having the form:

$$ \nabla f \left( x \right) = x - g \left( x \right) $$

Where $ g \left( x \right) : \mathbb{R}^{n} \to \mathbb{R}^{n} $ is also smooth (Each component).

One way to find a stationary point of $ f \left( x \right) $ is to find the Fixed Point (Points) of $ g \left( x \right) $ as it will vanish the gradient and hence must be a stationary point.

Does the case above guarantees the convergence of $ g \left( x \right) $? If not, what constraints should be imposed to guarantee convergence?
I had thought so as it is guaranteed that the Newton Method to work on $ f \left( x \right) $ yet I am not sure about it.

I read about the convergence of Fixed Point Iteration in Elementary Numerical Analysis MATH:3800/CS:3700 Notes, Section 3.4. Yet they are for general functions. I wonder if there is an analysis for the case above.

2

There are 2 best solutions below

0
On BEST ANSWER

Remark: This is the result of a joint discussion with @Ze-NanLi.

The conditions for $ g \left( x \right) $ to converge in Fixed Point Iterations are:

  1. $ g $ has a fixed point in a domain $ \mathcal{D} \subseteq \mathbb{R}^{n} $ if $ g : \mathcal{D} \to \mathcal{D} $. Namely in case of $ \mathcal{D} = \mathbb{R}^{n} $ it must not have $ {x}_{0} \in \mathbb{R}^{n} $ such that $ g \left( {x}_{0} \right) = \infty $.
  2. The Jacobian of $ g $ must obey $ {\nabla g \left( x \right)}^{T} \nabla g \left( x \right) \preceq \rho I $ where $ \rho < 1 $ (Equivalent, $ {\left\| \nabla g \left( x \right) \right\|}_{2} \leq \rho, \; x \in \mathcal{D} $).

The motivation for (2) is:

$$ \left\| \nabla g \left( x \right) \right\| = \sup_{ \left\| y \right\| = 1} \left\| \nabla g \left( x \right) y \right\| = \sup_{ \left\| y \right\| = 1} \sqrt{ {y}^{T} {\nabla g \left( x \right)}^{T} \nabla g \left( x \right) y} < \sup_{ \left\| y \right\| = 1} \sqrt{ {y}^{T} y } = 1 $$

Which is the condition for unique Fixed Point in the range.

Now, 2 conditions to satisfy (2) are:

  1. $ \nabla g \left( x \right) \prec I $.
  2. $ \nabla g \left( x \right) \succ -I $.

Since $ f \left( x \right) $ is convex it suggests that $ {H}_{ f \left( \cdot \right) } \left( x \right) \succeq 0 $ where $ {H}_{ f \left( \cdot \right) } $ is the Hessian Matrix of $ f \left( \cdot \right) $. Since:

$$ {H}_{ f \left( \cdot \right) } \left( x \right) = I - \nabla g \left( x \right) \succeq 0 \Rightarrow \nabla g \left( x \right) \preceq I $$

So if we want strict inequality one must demand $ f \left( \cdot \right) $ to be strictly convex.

So, in order to guarantee $ g \left( x \right) $ will converge to Fixed Point one must demand the following:

  1. The function $ f \left( x \right) $ must be strictly convex.
  2. The function $ g \left( x \right) $ must obey $ \nabla g \left( x \right) \succ -I $.

Alternative Derivation of the Condition $ \nabla g \left( x \right) \succ -I $

Similarly to the analysis the Newton Method as Fixed Point Iteration one could write:

$$ {x}^{k + 1} = {x}^{k} - \left( {x}^{k} - g \left( {x}^{k} \right) \right) $$

Yet by definition $ \nabla f \left( {x}^{k} \right) = {x}^{k} - g \left( {x}^{k} \right) $ so we have Gradient Descent with Step Size of 1.

For a function $ f \left( x \right) $ which is $ L $ Lipschitz Continuous Gradient means:

  1. The Hessian of the function obeys $ {H}_{f} \left( x \right) \preceq L I $.
  2. The gradient descent will converge for step size $ \alpha \leq \frac{2}{L} $.

From the above we have $ 1 \leq \frac{2}{L} \Rightarrow L \leq 2 $ and $ I - \nabla g \left( x \right) \preceq L I \preceq 2 I \Rightarrow \nabla g \left( x \right) \succeq -I $ as needed.

12
On

We need another condition $\nabla g(x) \succ -I$ and here function $f$ is strictly convex.

Or the condition $\nabla g(x)^T \nabla g(x) \preceq \rho$, where $\rho < 1$.

From the strictly convexity, we know that \begin{equation} \nabla g(x) \prec I \end{equation} will be held for any $x$, where $I$ is identity matrix. And due to $\nabla g(x) \succ I$, we can obtain \begin{equation} \nabla g(x)^T \nabla g(x) \prec I. \end{equation}

According to the Contraction Mapping Theorem, what we need is only $\|\nabla g(x)\| < 1$. Based on definition of matrix norm, we have \begin{equation} \|\nabla g(x)\| = \sup_{\|y\| = 1} \|\nabla g(x) y\| = \sup_{\|y\|=1} \sqrt{y^T\nabla g(x)^T \nabla g(x) y} < \sup_{\|y\|=1} \sqrt{y^Ty}=1. \end{equation}

By the way, actually, we can consider the sequence $x, g(x), g(g(x)), \dots$ as a special case of gradient descent. Let the step size of GD be $t=1$. So we have \begin{equation} x^{k+1} = x^k - (x^k - g(x^k))=g(x^k). \end{equation} To keep the convergence, we need the step size to satisfy $t < 2/L$, and $\nabla^2 f(x) \preceq L I$. Thus $L < 2$, i.e., \begin{equation} \nabla^2 f(x) = I - \nabla g(x) \prec 2I. \end{equation} Thus $\nabla^2 g(x) \succ -I$.