Alternative form of Gradient Descent

137 Views Asked by At

We define the standard gradient descent as follows:

$$x^{(k+1)} = \arg\min_{x\in\mathbb{R}^n} f(x^{(k)}) + \langle \nabla f(x^{(k)}), x - x^{(k)}\rangle + \frac{1}{2\alpha_k} ||x - x^{(k)}||_2^2$$

Here, $x^{(k)}$ is the result of the $k$-th iteration of the gradient descent. Solving the above equation gives us $x^{(k+1)} = x^{(k)} - \alpha_k \nabla f(x^{(k)})$, which is our usual gradient descent.

How do we prove that the gradient descent problem is equivalent to the following version:

$$x^{(k+1)} = \arg\min_{x\in\mathbb{R}^n}\langle \nabla f(x^{(k)}), x - x^{(k)}\rangle $$ subject to $$||x - x^{(k)}||_2 \leq \alpha_k||\nabla f(x^{(k)})||_2$$

2

There are 2 best solutions below

0
On BEST ANSWER

Two more elementary solutions, as requested by OP.

We need to show that $x=x_k-\alpha g_k$ is the optimal solution to the following problem: $$\min_x\ \langle g_k, x\rangle\quad \mbox{s.t. } \|x-x_k\| \le \alpha_k \|g_k\|.\tag{*}$$

Solution 1. Using Cauchy-Schwarz inequality we have $$|\langle \alpha_k g_k, x-x_k\rangle| \le \alpha_k\|g_k\| \|x-x_k\| \le \alpha_k^2 \|g_k\|^2,$$ which implies $\langle g_k, x - x_k\rangle \ge -\alpha_k \|g_k\|^2$, or equivalently, $\langle g_k, x\rangle \ge \langle g_k, x_k\rangle -\alpha_k \|g_k\|^2$. It is easy to check that equality occurs iff $\alpha_k g_k = x_k - x$. We conclude that the optimal solution to $(*)$ is $x=x_k-\alpha g_k$. QED

Solution 2. Notice that $$2\langle \alpha_k g_k, x-x_k\rangle + 2\alpha_k^2 \|g_k\|^2 \ge 2\langle \alpha_k g_k, x-x_k\rangle + \alpha_k^2 \|g_k\|^2 + \|x-x_k\|^2 = \|\alpha_k g_k + x-x_k\|^2 \ge 0,$$ with equality iff $\alpha_k g_k = x_k - x$. We conclude that the optimal solution to $(*)$ is $x=x_k-\alpha g_k$. QED

2
On

Just find the solution of the latter problem.

Denote $x_k = x^{(k)}$ and $g_k = \nabla f(x^{(k)})$. We need to solve the following convex problem: $$\min\ \langle g_k, x\rangle\quad \mbox{s.t. } \|x-x_k\|^2 \le \alpha_k^2 \|g_k\|^2.$$

If $\|g_k\| = 0$ then obviously $x=x_k$. Suppose now that $\|g_k\| > 0$. Slater's conditions hold because when $x=x_k$ we have $\|x-x_k\|^2 < \alpha_k^2 \|g_k\|^2$. Hence, the above problem is equivalent to its KKT system: \begin{align} g_k + 2\lambda(x-x_k) &= 0,\\ \lambda(\|x-x_k\|^2 - \alpha_k^2 \|g_k\|^2) &= 0,\\ \|x-x_k\|^2 &\le \alpha_k^2 \|g_k\|^2,\\ \lambda &\ge 0. \end{align} The first and the last equations yield $\lambda > 0$. The first equation now becomes $x = x_k - \frac{1}{2\lambda}g_k$. Plugging this into the second equation yields $\alpha_k = \frac{1}{2\lambda}$. We conclude that the optimal solution is $x = x_k - \alpha_k g_k$, QED.