How to minimize $f(x) = \|Ax-b\|$

Question

How to minimize $f(x) = \|Ax-b\|$

3.2k Views Asked by Bumbble Comm At 27 Mar 2026 - 6:07

Solve the problem of minimizing $f(x) = ||Ax-b||$. Consider all the cases and interpret geometrically.

If we write

$$\|Ax-b\| = (a_{11}x_1 + \cdots + a_{1n}x_n - b_1)^2 + \cdots + (a_{n1}x_1 + \cdots + a_{nn}x_n - b_1)^2$$

then

$$\frac{\partial \|Ax-b\|}{\partial x_j} = 2(a_{11}x_1 + \cdots + a_{1n}x_n - b_1)a_{1j} + \cdots + (a_{n1}x_1 + \cdots + a_{nn}x_n - b_1)a_{nj}$$

If I try to do $\frac{\partial \|Ax-b\|}{\partial x_j} = 0$ I get nothing useful. For $x$ to be a minimizer, I have to have gradient $0$ and hessian positive definite. If we do the hessian just to see:

$$\frac{\partial^2 \|Ax-b\|}{\partial x_k\partial x_j} = 2a_{1k}a_{1j} + \cdots + 2a_{nk}a_{nj}$$

I see nothing useful here.

I think the geometric interpretation comes from the conditions for the gradient to be $0$ and the hessian to be $>0$, but I don't find these conditions useful.

Any ideas?

Original Q&A

There are 3 best solutions below

Bumbble Comm On 20 Aug 2018 - 11:25

\begin{eqnarray*} \left \lVert Ax-y\right\rVert^{2} &=& x^{T} A^{T} A x -2 y^{T} A x + y^{T} y \end{eqnarray*}

Gradient w.r.t $x$ to $0$ translates to

\begin{eqnarray*} \nabla_{x} \left \lVert Ax-y\right\rVert^{2} &=& 2 A^{T} A x - 2A^{T} y =0 \end{eqnarray*}

yields, the normal equations. The solution is (the well known Least Square).

\begin{equation*} \hat{x}_{LS} = \left(A^{T} A \right)^{-1} A^{T} y \end{equation*}

Geometrically, the least square solution can be viewed as an (orthogonal) projection of the observation $y$ onto the image of $A$.

Bumbble Comm On 22 Aug 2018 - 5:49

Alternative:

In case you want to stick to $\lVert Ax - b \rVert = \mathrm{tr} \left[ \left(\left( Ax - b \right)^T \left( Ax - b \right) \right)^{\frac{1}{2}} \right]$, not the squared $\ell_2$ norm, i.e., $\lVert Ax - b \rVert^2$ (even though the final solution is same), here is an alternative.

We will utilize the following the identities

Trace and Frobenius product relation $$\left\langle A, B \right\rangle={\rm tr}(A^TB)$$ or $$\left\langle A^T, B \right\rangle ={\rm tr}(AB)$$
Cyclic property of Trace/Frobenius product \begin{align} \left\langle A, B C \right\rangle &= \left\langle AC^T, B \right\rangle \\ &= \left\langle B^T A, C \right\rangle \\ &= {\text{etc.}} \cr \end{align}

We obtain the differential first, and then the gradient.

Moreover, you can utilize this trace differential result \begin{align} d\,{\rm tr}(f(M)) &= \left\langle f^\prime(M^T), \ dM \right\rangle ,\\ \end{align} where $M = \left( Ax - b \right)^T \left( Ax - b \right)$ in the considered case and $f(\cdot) = \sqrt(\cdot)$.

So, \begin{align} d\, \lVert Ax - b \rVert &= d\, \mathrm{tr} \left[ \left(\left( Ax - b \right)^T \left( Ax - b \right) \right)^{\frac{1}{2}} \right] \\ &= \left\langle \frac {1} {2} \underbrace{\left[ \left(\left( Ax - b \right)^T \left( Ax - b \right) \right)^{-\frac{1}{2}} \right] }_{= \frac{1}{\lVert Ax - b \rVert}} , \ d\left[ \left(\left( Ax - b \right)^T \left( Ax - b \right) \right) \right] \right\rangle \\ &= \left\langle \frac {1} {2 \lVert Ax - b \rVert}, \ dx^T A^T \left(Ax-b \right) + \left(Ax-b \right)^T A \ dx \right\rangle \\ &= \left\langle \frac {1} {2 \lVert Ax - b \rVert}, \ 2\left(Ax-b \right)^T A \ dx \right\rangle \\ &= \left\langle \frac {A^T \left(Ax-b \right)} {\lVert Ax - b \rVert}, \ \ dx \right\rangle \\ \end{align}

Thus the derivative is $$ \eqalign { \frac { \partial} {\partial x}\lVert Ax - b \rVert &= \frac {A^T \left(Ax-b \right)} {\lVert Ax - b \rVert} . \cr } $$

Now, you can set the derivative to $0$ and the solution is the same as the above, i.e., \begin{align} x = \left(A^T A \right)^{-1} A^T b \ . \end{align}

**Bumbble Comm** · Accepted Answer

To expand one of the comments of your question, the norm minimization problem you are considering is a projection problem. Note that projection of a vector $b$ onto a set $S$ is nothing but finding a point $x\in S$ such that the distance between $b$ and $x$ is minimized. Define the projection function as $\Pi_S:\mathbb R^m\to S$. The projection of $x$ onto $S$ is given by $\Pi_S(x)$: $$ \Pi_s(x)=\arg\min_{y\in S}\|x-y\|. $$ Therefore if $S$ is defined to be the range of $A$, that is $$ S=\{Ax: x\in\mathbb R^n\}. $$ Then the problem of minimizing $\|b-Ax\|$, which is a least square problem, can be understood as projecting $b$ onto the rang of $A$, i.e., to find $\Pi_S(b)$.

The projection is given by a linear operator, say $P$. $P$ is symmetric and moreover by definition the projection of each vector $y\in S$ onto $S$ will be itself: $Py=y$. This means that $PA=A$. As an exercise try to see that the projection satisfies the intuitive idea that $Px-x$ should be orthogonal to each vector in $S$ (Hint: $A(I-P)=0$). See the figure below:

The projection matrix is unique (see why!) and therefore for each $b$ there is a unique point on the range of $A$ as the projection of $b$ namely $Pb$.

The projection into the range of $A$ can be found using Moore-Penrose inverse of $A$. The Moore-Penrose inverse of $A$, $A^\dagger$ satisfies among others: $$ AA^\dagger A=A, A^\dagger AA^\dagger=A^\dagger. $$ and $AA^\dagger$ is symmetric. With small efforts one can see that $P=AA^\dagger$. Therefore $z=AA^\dagger b$ gives you the projection of $b$ onto $S$. So we found the point $z$ in the range of $A$ minimizing the distance to $b$. But what about finding $x$ such that $Ax=z$? Well, in general there is no unique solution however we can see that: $$ x=A^\dagger b\implies Ax=AA^\dagger b=z. $$ Hence, $x=A^\dagger b$ is a solution. Moreover for $v$ in the kernel of $A$, $x+v$ is also a solution. The only way for unique solution is for $A$ to have only zero vector in the kernel.

Then how can we find $A^\dagger$? If $A^TA$ is invertible, $A^\dagger=(A^TA)^{-1}A^T$ as you could see in the other answers. Otherwise, there are other methods like using singular value decomposition to find the Moore-Penrose inverse.

How to minimize $f(x) = \|Ax-b\|$

There are 3 best solutions below

Related Questions in CALCULUS

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in OPTIMIZATION

Related Questions in MAXIMA-MINIMA

Trending Questions

Popular # Hahtags

Popular Questions