Minimize quadratic function with gradient descent method

526 Views Asked by At

Anyone could help me wih this question, please?

I made an exam and I didn't know how I can solve this.

$A \in \mathbb{R}^{n \times n}$ is a matrix symmetric positive definite, $b \in \mathbb{R}^{n\times 1}$ and we have $f(x) = \dfrac{1}{2}x^TAx -b^Tx$. Give a point $x^0 \in \mathbb{R}^{n \times 1}$, show that the gradient method descent given by $$ x^{k+1}=x^k -\left(||A||_{2}\right)^{-1} \nabla f(x^k)$$ converges to the global minimum.

So, this $|| \cdot||_2$ is an Euclidean norm.

I made this:

$|| A||_2 = || A|| = \sup\{||Ax||:||x|| =1 \} $, and I thought A has positive real eigenvalues $0 < \lambda_1 \leq \lambda_2 \leq \cdots\leq \lambda_n$ and that I would obtain an orthonormal basis of the matrix $A$ formed by the eigenvectors $v_1, v_2, ..., v_n$ associated with the respective eigenvalues $\lambda_i$. So, how $||Av|| = ||\lambda v|| = |\lambda| \;||v|| = \lambda$ I suppose that $||A|| = \lambda_n = \lambda$ and so $||A||^{-1}=\dfrac{1}{\lambda}$. Now, I have the following sequence $$ x^{k+1}=x^k -\dfrac{1}{\lambda} \nabla f(x^k). $$

I have argued that $f$ has only a local minimum and that it is global. Also said that the minimum is given by $x^{*} =A^{-1}b$.

But I could not show that the sequence above converges to $x^{*}$.

Then I thought I did not need to show that I was converging to $x^{*}$. It is enough to show that it converged, for it guarantees that it would be for this element due to its uniqueness (only having a global minimum).

But I could not solve it. It reminded me of contraction. I have already tried a recurrence by doing $$ x^{k+1} - x^0 = -\dfrac{1}{\lambda}\left( \nabla f(x^0) + \nabla f(x^1) + \cdots +\nabla f(x^k) \right) $$ but I could not move forward.

I also tried to think in $||x^{k+1}-x^{k} || = ||-\frac{1}{\lambda} \nabla f(x^k) || = \frac{1}{\lambda} || \nabla f(x^k) ||$, which was when it reminded me of a contraction. But I could not finish anything either. Incidentally, this holds if $1 / \lambda$ is less than $1$ and I could not guarantee that.

Could you help me? I already did the test, but I wanted to see the solution.

PS: I found this, but I can't understand. https://en.wikipedia.org/wiki/Modified_Richardson_iteration

1

There are 1 best solutions below

0
On

Taylor expanding $ f\big(x^{k+1} \big) = f \Big(x^k - \Vert A \Vert_2^{-1} \nabla f\big(x^k\big) \Big) $ up to second order, you obtain an identical representation (recall that $f(x)$ is quadratic) \begin{align} f\big(x^{k+1} \big) &= f \Big(x^k - \Vert A \Vert_2^{-1} \nabla f\big(x^k\big) \Big) \\ &= f\big(x^k \big) + \nabla f\big(x^k \big)^T \Big[ - \Vert A \Vert_2^{-1} \nabla f\big(x^k\big)\Big] + \frac12 \Vert A \Vert_2^{-2} \nabla f\big(x^k\big)^T A \nabla f\big(x^k\big) \\ \Vert A \Vert_2 \Big[f\big(x^{k+1} \big) - f\big(x^k \big) \Big] &= \nabla f\big(x^k \big)^T \Big[ - I \nabla f\big(x^k\big)\Big] + \frac12 \Vert A \Vert_2^{-1} \nabla f\big(x^k\big)^T A \nabla f\big(x^k\big) \\ \Vert A \Vert_2 \Big[f\big(x^{k+1} \big) - f\big(x^k \big) \Big] &= \nabla f\big(x^k \big)^T \bigg[ \frac12 \Vert A \Vert_2^{-1} A - I \bigg] \nabla f\big(x^k\big). \end{align}

Now recall that if $\lambda$ is an eigenvalue of $A$, $\omega A - I$ has eigenvalues $\omega \lambda - 1, \lambda \in \sigma(A)$ for $\omega \in \mathbb R$. Thus in our case, $\frac12 \Vert A \Vert_2^{-1} A - I$ has eigenvalues $$\frac{\lambda}{2\Vert A \Vert_2} -1 = \frac{\lambda}{2 \lambda_\max } -1. $$ Since all eigenvalues $\lambda$ of $A$ are real and positive ($A$ is s.p.d.), we know have that all eigenvalues of $\frac12 \Vert A \Vert_2^{-1} A - I$ are negative. Consequently, $\frac12 \Vert A \Vert_2^{-1} A - I$ is negative definite and $$ \nabla f\big(x^k \big)^T \bigg[ \frac12 \Vert A \Vert_2^{-1} A - I \bigg] \nabla f\big(x^k\big) < 0 \: \forall \: \nabla f\big(x^k \big) \neq 0. $$

Thus, $$ f\big(x^{k+1} \big) - f\big(x^k \big) < 0 \: \forall \: k, \: \forall \: \nabla f\big(x^k \big) \neq 0.$$ In other words, the function value will always decrease, unless the gradient is zero, which is the case when you are at the unique minimum.