Finding $\mathbf{x}$ that minimizes $\|\mathbf{x}-\mathbf{x_0}\|^2$ subject to minimizing $\|A\mathbf{x}-\mathbf{b}\|^2$

111 Views Asked by At

What is a good way to find $\mathbf{x}\in\mathbb{R}^n$ that

  • minimizes $\|A\mathbf{x}-\mathbf{b}\|^2$
  • and subject to the above, minimizes $\|\mathbf{x}-\mathbf{x_0}\|^2$

where

  • $A\in\mathbb{R}^{m\times n}$ (a sparse matrix, if it matters)
  • $\textrm{rank}(A) = n-1$
  • $m\approx4n$

?

3

There are 3 best solutions below

0
On BEST ANSWER

If $A$ is invertible, then $$ x=A^{-1}b\tag1 $$ minimizes $\|Ax-b\|^2$ uniquely. However, in the question $A$ is not square and not even of full rank, it is not invertible.

If $A$ is not invertible, compute $P=AI_A$ where the columns of $P$ are a maximal independent subset of the columns of $A$. That is, $I_A$ has a single $1$ per column to pick out a single column from $A$. If the columns of $A$ are independent, then $I_A=I$ and $P=A$. In the question, $A$ has rank $n-1$, so $P$ and $I_A$ will have $n-1$ columns.

Since $$ \delta\|Pu-b\|^2=2\langle\overbrace{\color{#C00}{P^T(Pu-b)}}^{\color{#C00}{=0}},\delta u\rangle\tag2 $$ we have that $$ \begin{align} x_1 &=I_A\color{#C00}{u}\\ &=I_A\color{#C00}{\left(P^TP\right)^{-1}P^Tb}\tag3 \end{align} $$ minimizes $\|Ax-b\|^2$.

Let the columns of $N$ be linearly independent and span the null space of $A$ (i.e. $AN=0$). In the question, the null space has dimension $1$, so $N$ will have one column.

Since $$ \delta\langle x_1+Nu-x_0,x_1+Nu-x_0\rangle=2\langle\overbrace{\color{#C00}{N^T(x_1+Nu-x_0)}}^{\color{#C00}{=0}},\delta u\rangle\tag4 $$ we have that $$ \begin{align} x_2 &=x_1+N\color{#C00}{u}\\ &=x_1+N\color{#C00}{\left(N^TN\right)^{-1}N^T(x_0-x_1)}\\ &=\left(I-N\left(N^TN\right)^{-1}N^T\right)\underbrace{I_A\left(P^TP\right)^{-1}P^T\color{#090}{b}}_{x_1}+N\left(N^TN\right)^{-1}N^T\color{#090}{x_0}\tag5 \end{align} $$ minimizes $\|x-x_0\|^2$ given that $\|Ax-b\|^2$ is minimized.

Note that in the case where $A$ is invertible, we can take $I_A=I$, $P=A$, and $N(N^TN)^{-1}N^T=0$ to get $(1)$ as the result.

1
On

Create a cost function which is the linear combination of both least-squares objectives $$\eqalign{ \phi &= \tfrac{1}{2}\|Ax-b\|^2 + \tfrac{\alpha}{2}\|x-x_0\|^2 \\ }$$ where the parameter $\alpha$ controls the relative importance of the objectives.

Calculate the gradient $$\eqalign{ \frac{\partial\phi}{\partial x} &= A^T(Ax-b) + \alpha(x-x_0) \\ }$$ set it to zero, and solve for the optimal vector for any chosen value of $\alpha$ $$\eqalign{ &(A^TA+\alpha I)x_\alpha = A^Tb+\alpha x_0 \\ &x_\alpha = (A^TA+\alpha I)^{-1}(A^Tb+\alpha x_0) \\ }$$ The limiting values of the solution vector are $$\eqalign{ \lim_{\alpha\to 0}x_\alpha = A^+b, \qquad \lim_{\alpha\to\pm\infty}x_\alpha = x_0 \\ }$$ as one would expect.


Based on the feedback in the comments, your approach consists of two steps.

First, writing the general solution to the linear system as $$x = A^+b + Py$$ where $y$ is an arbitrary vector and $P=(I-A^+A)$ is the nullspace projector, whose most important properties are: $$\,P^T=P=P^2,\qquad PA^+=0,\qquad AP=0$$

Then choosing $y$ as the solution of a least-squares problem. $$\eqalign{ \psi &= \|(Py+A^+b)-x_0\|^2 \\ \frac{\partial\psi}{\partial y} &= P^T(Py+A^+b-x_0) \\ &= Py-Px_0 \quad\implies\quad y=x_0 \\ }$$ And, finally, substituting this back into the general solution $$\eqalign{ x &= A^+b + (I-A^+A)x_0 }$$ The problem, of course, is that there's no guarantee that $Px_0\ne 0$, i.e. orthogonal to the nullspace.

0
On

Note that $\ A^TA\ $ is an $\ n\times n\ $ symmetric matrix of rank $\ n-1\ $, $\ A^Tb\ $ is orthogonal to its nullspace, and hence lies in its column space. The linear equations $$ A^TAy=A^Tb $$ therefore have a one dimensional set of solutions $\ y_t= y_0 + t\eta\ $, where $\ \eta\ne0\ $ is a basis for the nullspace of $\ A^TA\ $. Now if $\ x\in\mathbb{R}^n\ $, then \begin{align} \|Ax-b\|^2 &=x^TA^TAx -2x^TA^Tb+b^Tb\\ &=x^TA^TAx -2x^TA^TAy_t+b^Tb\\ &=(x-y_t)^TA^TA(x-y_t)+b^Tb-y_t^TA^TAy_t\\ &=(x-y_t)^TA^TA(x-y_t)+b^Tb-y_0^TA^TAy_0\\ &\ge b^Tb-y_0^TA^TAy_0\ . \end{align} Thus, the minimum value of $\ \|Ax-b\|^2\ $ is $\ b^Tb-y_0^TA^TAy_0\ $, and is achieved if and only if $\ x=y_t\ $ for some real $\ t\ $.

And now the value of $\ y_t\ $ that minimises $\ \|y_t-x_0\|^2=$$ \|y_0+t\eta-x_0\|^2\ $ is just the orthogonal projection of $\ x_0\ $ onto the line $\ x=y_0+t\eta\ $—that is $$ x=y_0-\frac{(y_0-x_0)^T\eta}{\eta^T\eta}\eta\ . $$