Rate of Convergence for Gradient Descent (Example)

4.7k Views Asked by At

I am trying to determine the rate of convergence for $f(x,y) = 5x^2 + 5y^2 − xy − 11x + 11y$. Would anyone be able to provide guidance as to how I might go about doing this? Should I select my own initial $x_k$ and constant $a$? What is a general procedure?

1

There are 1 best solutions below

1
On

I am not sure which method you are using, but you can see the different approaches in these Lecture Notes - Section 4.3. Having said that, I will provide a solution using the Convergence Theory - Section 3 and you can get whatever mileage out of it in case you are using one of the other approaches.

We are given:

$$\tag 1 f(x,y) = 5x^2 + 5y^2 − xy − 11x + 11y,$$

and asked to find the Rate of Convergence using the Gradient Descent method.

We can write $f(x, y)$ as a quadratic function in order to use the referenced Convergence Theory as:

$v = \begin{pmatrix} x \\ y \end{pmatrix}, ~A = \begin{pmatrix} 10 & -1 \\ -1 & 10 \end{pmatrix}, ~b = \begin{pmatrix} 11 \\ -11 \end{pmatrix}$

We can now write:

$\tag 2 \displaystyle f(v) = \frac{1}{2} v^T \cdot A \cdot v - v^t \cdot b = (\frac{1}{2}) \begin{pmatrix} x & y \end{pmatrix} \cdot \begin{pmatrix} 10 & -1 \\ -1 & 10 \end{pmatrix} \cdot \begin{pmatrix} x \\ y \end{pmatrix} - \begin{pmatrix} x & y \end{pmatrix} \cdot \begin{pmatrix} 11 \\ -11 \end{pmatrix}$

Note, multiply out $(2)$ and verify it produces $f(x, y)$.

The matrix $A$ produces two eigenvalues $\lambda_1 = 9$ and $\lambda_2 = 11.$

Since $A$ is normal, the condition number is given by:

$K = \displaystyle \frac{\lambda_max}{\lambda_min} = \frac{11}{9}.$

We can now find the global minimizer, $v^*$, such that $Av* = b$.

Solving this, we arrive at $v^* = \begin{pmatrix} 1 \\ -1 \end{pmatrix}.$

We can now calculate the minimum as $f(v^*) = -11$ and verify the same result using the original function, $f(x, y) = f(1, -1) = -11.$

The convergence of Steepest Descent is governed by the condition number of the matrix and defined by the constant:

$$\displaystyle \rho = \frac{K-1}{K+1} = \frac{\frac{11}{9} - 1}{\frac{11}{9}+1} = \frac{1}{10} \text{and} ||e^{(k)}||_A \le \rho^k||e^{(0)}||_A$$

So, we have (from theory):

$$\displaystyle 0.5 ||e^{(k)}||^2_A \le 0.5 \rho^{2k}||e^{(0)}||^2_A = 0.5(\frac{1}{100})^k ||e^{(0)}||^2_A = 5.5 \cdot 10^{-2k}$$

This tells us we need six to seven steps to get $\displaystyle 0.5 ||e^{(k)}||^2_A \approx 10^{-11}.$

Now, you should do your iterations and see that that is the number of steps to converge to the solution above.

Additionally, you should compare the convergence to the exact result and make sure you get what the condition number was telling us.