Understanding the gradient descent method as applied to a specific quadratic problem

30 Views Asked by At

The following example of the gradient descent method is given in section 9.3.2 on pp. 469-470 of Boyd and Vandenberghe's Convex Optimization (Cambridge University Press, 2009).

Let $\gamma > 0$, and consider the following objective function on $\mathbb{R}^2$: $$ f(x) = \frac{1}{2}(x_1^2+\gamma x_2^2). $$

According to Boyd and Vandenverghe, applying the gradient descent method with exact line search, starting at $x^{(0)} = \begin{bmatrix}\gamma\\1\end{bmatrix}$, the following closed-form expressions for the iterates $x^{(k)}$ ensue: $$ x_1^{(k)} = \gamma\bigg(\frac{\gamma-1}{\gamma+1}\bigg)^k,\qquad x_2^{(k)} = \bigg(-\frac{\gamma-1}{\gamma+1}\bigg)^k.\tag{*} $$

However, when I try to derive the expression for $x_1^{(1)}$ and $x_2^{(1)}$ I get different expressions than those appearing in $(*)$. Here's my derivation. I'd appreciate it if someone could point out where my error lies.

We can write $f$ as $f(x) = \frac{1}{2}x^TAx$, where $A := \begin{bmatrix}1&0\\0&\gamma\end{bmatrix}$, showing that $f$ is a quadratic form. Hence, $$ \Delta x = -\nabla f(x) = -Ax = \begin{bmatrix}-x_1\\-\gamma x_2\end{bmatrix}. $$

Therefore, \begin{align*} f(x+s\Delta x) &= f(x-sAx)\\ &= f(\begin{bmatrix}x_1(1-s)\\x_2(1-\gamma s)\end{bmatrix})\\ &= \frac{1}{2}\bigg(\big(x_1(1-s)\big)^2 + \gamma\big(x_2(1-\gamma s)\big)^2\bigg)\\ &= \frac{x_1^2(\gamma^3+1)}{2}s^2 -x_1^2(\gamma^2+1)s + \frac{x_1^2(\gamma+1)}{2} \end{align*}

Hence $t = \text{argmin}_{s\geq0}f(x+s\Delta x)$ is the root of the derivative of the function $$ s\mapsto \frac{x_1^2(\gamma^3+1)}{2}s^2 -x_1^2(\gamma^2+1)s + \frac{x_1^2(\gamma+1)}{2}, $$ namely $t = \frac{\gamma^2+1}{\gamma^3+1}$.

Hence, \begin{align*} x^{(1)} &= x^{(0)} + t\Delta x^{(0)}\\ &= \begin{bmatrix}\gamma\\1\end{bmatrix} + \frac{\gamma^2+1}{\gamma^3+1}\begin{bmatrix}-\gamma\\-\gamma\end{bmatrix}\\ &= \begin{bmatrix}\gamma\big(1-\frac{\gamma^2+1}{\gamma^3+1}\big)\\1-\gamma\frac{\gamma^2+1}{\gamma^3+1}\end{bmatrix}. \end{align*}

Therefore, \begin{align*} x_1^{(1)} &= \gamma\big(1-\frac{\gamma^2+1}{\gamma^3+1}\big)\\ &= \gamma\frac{\gamma-1}{\gamma+1}\cdot\frac{\gamma^2}{\gamma^2-\gamma+1}\\ x_2^{(2)}&= 1-\gamma\frac{\gamma^2+1}{\gamma^3+1}\\ &= \frac{\gamma-1}{\gamma+1}\cdot\frac{1}{\gamma^2-\gamma+1}. \end{align*}