Choose start vector for gradient descent so that it converges with one step

102 Views Asked by At

I'm currently looking at an old exam and I have encountered the following task:

$ A=\left( \begin{array}{rrr} 2 & -1 \\ -1 & 2 \\ \end{array}\right)$ $b=\left( \begin{array}{rrr} 1 \\ 1 \\ \end{array}\right) $

Alongside this there's an image with a coordinate system and contour lines. The exact solution (1,1) is drawn in.The task is to mark a start vector $ x_0 \ne (1,1) $ for which the gradient descent method arrives at the exact solution after one step,

I'm not quite sure how to choose a fitting starting vector here. I know that the gradient descent in general converges for convex problems like this but not necessarily within a given amount of steps. How would I go about this?

1

There are 1 best solutions below

0
On

At a high level, a "step" in the gradient descent algorithm always moves perpendicular to the level set of the current iteration. This property is a direct result of the definition of the gradient vector itself (it always points in the direction of steepest ascent). This article provides a good (and relatively informal) overview of the gradient vector, and gradient descent itself (with examples).

Using this knowledge of how the gradient descent algorithm moves through the search space (namely, each new iteration is perpendicular to the current level set), we may be able to mark a start vector on the search space that converges in a single step... In which direction will we move from this start vector? Does this direction point directly to the optimal minimum? One may also visualize it as when the following occurs: the vector that points towards the optimal is orthogonal to the tangent vector of the current level set.

If the given matrix $A$ (a bilinear operator) is given in a quadratic form, we can solve the problem explicitly. The trick is to directly apply the gradient descent formula: $$x_{n+1} = x_n - \alpha \nabla f(x_n).$$ In particular, we want our algorithm to converge in a single step. Hence, we want $x_1$ to be our optimal solution, and $x_0$ our starting vector. Then, we have $$x_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix} = x_0 - \alpha \nabla f(x_0).$$ Using $A$ within the quadratic form, we can find a representation of $\nabla f(x_0)$ and get an explicit equation for $x_0$ (the desired starting vector). This post provides an example.