Conjugate Gradient Method Near Exact Line Search

442 Views Asked by At

Unlike Newton-type methods, there is no natural step-length value $\alpha _k$ in conjugate gradient methods. Because of this, why do we need to use a near exact line search if we are to expect rapid convergence of conjugate gradient methods?

1

There are 1 best solutions below

0
On

I am not sure what you exactly mean, the classic CG Algorithm has indeed a certain optimal step size which can be derived as follows:

Set up the function $$f(x) := \boldsymbol x^T A \boldsymbol x + \boldsymbol b^T \boldsymbol x = \langle \boldsymbol x, A \boldsymbol x\rangle - \langle \boldsymbol x, \boldsymbol b \rangle \tag1$$ which can be motivated as follows (you can skip this part if you want and directly go to the next section).

Okay, let's say we want to solve the linear system $$A\boldsymbol x=\boldsymbol b \tag2$$ with symmetric positive matrix (s.p.d) $A$. From very basic multidimensional optimization, one knows that if the Hesse matrix $\nabla^2 f(\boldsymbol x) $ of the objective function $f(\boldsymbol x)$ is positive definite $\forall \: \boldsymbol x$, the objective function is globally strictly convex and thus there is only one global minimizer. Let's try to leverage the s.p.d. property of $A$: We aim for an objective function such that $$\nabla^2 f(\boldsymbol x) \overset{!}{=} A. \tag3$$ Again from optimization theory for convex functions, we know that we are at the optimum $\boldsymbol x^\star$ if $\nabla f(\boldsymbol x^\star) = \boldsymbol 0 $. The optimal $\boldsymbol x^\star$ for our problem would be the one that solves the linear system: $$A\boldsymbol x^\star =\boldsymbol b.\tag4$$ We see that this can be enforced by "integrating/solving the PDE" $\nabla^2 f(\boldsymbol x) = A$ and picking the integration constant in a clever way, namely as $-\boldsymbol b$: $$\nabla f(x) = A \boldsymbol x - \boldsymbol b. \tag5$$ Again, by clever integration / PDE solve, you can construct the objective function $$f(\boldsymbol x) = \frac12 \boldsymbol x^T A \boldsymbol x - \boldsymbol x^T \boldsymbol b. \tag6$$ Now solving the linear system is equivalent to minimizing $f(\boldsymbol x)$.


Now continue with optimization 101-stuff: The first method you learn in numerical optimization is steepest descent, which can be slightly more generalized be written as $$ \boldsymbol x^{k+1} = \boldsymbol x^k + \alpha \boldsymbol p^{k}. \label{7} \tag7$$ In the case of steepest descent, $\boldsymbol p^k = -\nabla f (\boldsymbol x^k)$. Let us leave $\boldsymbol p^k \neq \boldsymbol 0 $ general for the moment and let's turn our attention towards the step-length $\alpha$: $$f\big(\boldsymbol x^{k+1} \big) \overset{\eqref{7}}{=} f\big(\boldsymbol x^k + \alpha \boldsymbol p^{k}\big) = f\big(\boldsymbol x^k \big) + \alpha \big\langle \boldsymbol p^k , \underbrace{A \boldsymbol x^k -b}_{=:\boldsymbol r^k} \big\rangle + \frac12 \alpha^2 \big\langle \boldsymbol p^k , A \boldsymbol p^k \big\rangle =: g(\alpha) \label{8}\tag8$$ For fixed $\boldsymbol x^k, \boldsymbol p^k$ this is a scalar, convex, univariate function in $\alpha$ with minimum at $\alpha^\star$ such that $g'(\alpha^\star) = 0$ which is clearly the case when $$\alpha^\star \big(\boldsymbol x^k, \boldsymbol p^k \big) =- \frac{ \big \langle \boldsymbol p^k, \boldsymbol r^k \big \rangle }{\big \langle \boldsymbol p^k,A \boldsymbol p^k \big \rangle}. \label{9} \tag9$$ Again, valid for any choice of search-direction $ \boldsymbol p^k$, so for both steepest descent or the CG direction.