Explanation of gradient descent on convex quadratic

31 Views Asked by At

Can someone explain the following: $$f(x) = \frac{1}{2}w^TAw - b^Tw$$

Assume AA is symmetric and invertible, then the optimal solution $w^{\star}$ occurs at

$$w^{\star} = A^{-1}b$$ and $$\nabla f(w) = Aw - b $$

can someone explain how we got $w^{\star}$ and $\nabla f(w)$? I need detailed math overview of that, I forgot most of calc/LA.

Thank you!

1

There are 1 best solutions below

0
On BEST ANSWER

I think you have a minor typo: $f(x)$ should be $f(w)$.

$f$ is a function of several variables $w_1, w_2, \ldots$. If you find the partial derivatives $\partial f / \partial w_i$ for each $i$, you can combine them into a vector to form the gradient $\nabla f$. It would be good for you to do this slowly. It may be helpful to rewrite $f$ as $$f(w) = \frac{1}{2} \sum_i \sum_j A_{ij} w_i w_j - \sum_i b_i w_i.$$ Note that there are "shortcuts" for computing this gradient using matrix calculus, but they won't be particularly helpful if you don't understand what's going on.

Once you have the gradient, any solution to $\nabla f(w) = 0$ is a critical/stationary point. In this case, there is only one, namely $w^\star = A^{-1} b$.