Why is finding the minimum of $\phi(x)={1\over2}x^TAx-x^Tb$ equivalent to finding the solution of $Ax=b$?

110 Views Asked by At

Let $A \in \Bbb R^{n \times n}$ be symmetric and invertible, and $b \in \Bbb R^n$. Why is finding the minimum of $$\phi(x)={1\over2}x^TAx-x^Tb$$ equivalent to finding the solution of $Ax=b$?

$$Ax=b \implies Ax-b=0\nRightarrow\text{?}{1\over2}x^TAx-x^Tb=0\text{ is minimal}$$

The other implication is also obscure to me.

3

There are 3 best solutions below

3
On

Are all eigenvalues of $A$ positive? Otherwise if $A$ has negative eigenvalues, then finding the minimum will NOT be equivalent to finding the solution to $Ax = b$

Assuming that all eigenvalues of $A$ are positive: The partial derivatives of each component of the objective function is 0 at the vector $x$ satisfying $Ax = b$. This is a simple calculation.

6
On

You have: \begin{align} \phi(x+h)-\phi(x)&={1\over2}(x+h)^TA(x+h)-(x+h)^Tb-\left({1\over2}x^TAx-x^Tb \right)\\ &={1\over2}\left( h^T A x+x^T A h\right) -h^T b +{1 \over 2} h^T A h \end{align} so if $x$ is an extremum then $\forall h$: $${1\over2}\left( h^T A x+x^T A h\right) -h^T b ={1\over2}\left( h^T A x+h^T A^T x\right) -h^T b=h^T \left(\frac{A^T+A}{2}x-b \right)=0$$ i.e: $$\left(\frac{A^T+A}{2}x-b \right)=0$$ which gives the desired result if $A$ is symmetric.

For the other implication if $A$ is positive the previous calculation gives the result but if $A$ is not positive this is not true: for example in dimension $1$ with $A=(-1)$ you obtain a maximum and not a minimum.

0
On

If the matrix is positive definite, then the minimum will occur when $\nabla_x \phi(x) = 0$.

Just doing out the math we have

$$ \begin{align*} \nabla_x \phi(x) &= \nabla_x \left(\frac{1}{2} x^TAx - b^Tx\right)\\ &=\frac{1}{2} \nabla_x \left(x^TAx\right) - \nabla_x(b^Tx)\\ &= \frac{1}{2} \left(A +A^T\right)x - b \\ &= Ax -b \qquad // \quad A = A^T\\ \end{align*} $$ Setting this gradient to 0 we have the optimality condition stated. As others have stressed, this only means that $x^\star$ satisfying this condition is a stationary point unless you know $A$ is symmetric

The gradients used in line 3 are common enough that they should probably be committed to memory. The gradient of $b^Tx$ can be derived by writing out the dot product as $\sum_{i=1}^n b_ix_i$. Or you can just remember that it's analagous to the scalar case $\frac{d}{dx} (bx) = b$

The gradient of the quadratic is a little more effort to derive but can be done similarly by writing out the matrix multiplication and just taking partial derivatives $$x^TAx = \sum_{i=1}^n\sum_{j=1}^n x_ix_jA_{ij} = \sum_{i=1}^n A_{ii}x_i^2 + 2\sum_{i=1}^n\sum_{j=i+1}^n A_{ij}x_ix_j$$

The fact that $\nabla_x x^TAx = 2Ax$ when $A$ is symmetric is easily remembered by analogy with the scalar case where $\frac{d}{dx} (ax^2) = 2ax$