Minimum of $||\mathbf{x-y}||^2+||\mathbf{Ax}-\mathbf{b}||^2$

130 Views Asked by At

Given a matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$ and vectors $\mathbf{b}\in\mathbb{R}^m$ and $\mathbf{y}\in\mathbb{R}^n$, I need to find the vector $\mathbf{x_*}\in\mathbb{R}^n$ such that

$$\mathbf{x_*} = \text{arg}\min_{\mathbf{x}\in\mathbb{R}^n}{||\mathbf{x-y}||^2+||\mathbf{Ax}-\mathbf{b}||^2}$$

If $\mathbf{y}$ satisfies that $\mathbf{Ay=b}$, then $\mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $\mathbf{Ay\neq b}$.

It's easy to check that $||\mathbf{x-y}||^2+||\mathbf{Ax}-\mathbf{b}||^2 = \left\Vert\begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}\mathbf{x}-\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} \right\Vert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $\mathbf{x}_* = \begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}^\dagger\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix}\hspace{1em}$where $^\dagger$ denotes de psuedo-inverse of a matrix.

I found on internet that an alternative solution, is to find the vector $\mathbf{x}$ such that the gradient of $||\mathbf{x-y}||^2+||\mathbf{Ax}-\mathbf{b}||^2$ is null. Then, I yields to the followin system

$$\mathbf{(I+A^TA)x=y+A^Tb}$$

My question is, why

$$\mathbf{x}_* = \begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}^\dagger\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} = \mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$


[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.

3

There are 3 best solutions below

1
On BEST ANSWER

Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $\begin{bmatrix}\mathbf{A} \\\mathbf{I}\end{bmatrix}$ can be expressed as $\left(\begin{bmatrix}\mathbf{A} \\ \mathbf{I}\end{bmatrix}^T\begin{bmatrix}\mathbf{A} \\ \mathbf{I}\end{bmatrix}\right)^{-1}\begin{bmatrix}\mathbf{A} \\ \mathbf{I}\end{bmatrix}^T$, if the inverse exists. This expression simplifies to $\left(\mathbf{A^TA + I}\right)^{-1}\begin{bmatrix}\mathbf{A} \\\mathbf{I}\end{bmatrix}^T$. Therefore, $$\mathbf{x}_* = \begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}^\dagger\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} = \left(\mathbf{A^TA + I}\right)^{-1}\begin{bmatrix}\mathbf{A} \\\mathbf{I}\end{bmatrix}^T \begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} = \mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$

0
On

Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $\sum$ signs makes everything more readable).

$$L=\sum_i (x_i-y_i)^2 + \sum_{i}(\sum_j A_{ij}x_j-b_i)^2$$ For each $k$, this must be zero (you also know $\partial x_i/\partial x_k=\delta_{ik}$): $$0=\frac{\partial L}{\partial x_k}=\sum_i 2(x_i-y_i)\delta_{ik}+\sum_{i}2(\sum_j A_{ij}x_j-b_i)\sum_{j}A_{ij}\delta_{jk}$$ the parts with identities ($\delta$) can be resolved, and all can be divided by 2: $$0=(x_k-y_k)+\sum_{i}A_{ik}(\sum_j A_{ij}x_j-b_i)$$ This can be rewriteerewritten back into matrix form: $$0={\bf x}-{\bf y}+{\bf A}^T ({\bf A x}-{\bf b})$$ Expand and join the $\bf x$ terms: $$({\bf I}+{\bf A}^T{\bf A}){\bf x}={\bf y}+{\bf A}^T{\bf b}$$ The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).

Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.


This time, I can outline less index-y way, by using $||x||^2=x^Tx$:

$${\bf C}=\begin{bmatrix}{\bf A}\\{\bf I}\end{bmatrix}$$ $${\bf z}=\begin{bmatrix}{\bf b}\\{\bf y}\end{bmatrix}$$ now it's $$L=||{\bf C}{\bf x}-{\bf z}||^2={\bf x}^T {\bf C}^T {\bf C x} - {\bf x}^T{\bf C}^T{\bf z}-{\bf z}^T{\bf C x}+{\bf z}^T{\bf z}$$ which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant): $$\frac{\partial L}{\partial \bf x}={\Box}^T {\bf C}^T {\bf C x}+{\bf x}^T {\bf C}^T {\bf C } \Box- {\Box}^T{\bf C}^T{\bf z}-{\bf z}^T{\bf C }\Box$$ You can imagine setting $\Box=(1,0,0,\ldots)$ to get $\partial L/\partial x_1$ and so on ($\Box$ is the direction of differentiation, in $\bf x$ space). The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$): $$\frac{\partial L}{\partial \bf x}=2{\Box}^T {\bf C}^T {\bf C x}-2 {\Box}^T{\bf C}^T{\bf z}=2\Box^T({\bf C}^T {\bf C x}-{\bf C}^T{\bf z})$$ For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).

If you simply require ${\bf C}{\bf x}={\bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing $${\bf x}={\bf C}^\dagger{\bf z}$$ which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.

0
On

Defining $\mathbf{A}^{\dagger} = (\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T$ and performing the indicated algebraic operations, then we have

\begin{equation} \begin{split} \begin{bmatrix} \mathbf{A} \\ \mathbf{I} \end{bmatrix}^{\dagger} \begin{bmatrix} \mathbf{b} \\ \mathbf{y} \end{bmatrix} = & \left( \begin{bmatrix} \mathbf{A}^T& \mathbf{I} \end{bmatrix} \begin{bmatrix} \mathbf{A} \\ \mathbf{I} \end{bmatrix} \right)^{-1} \left(\begin{bmatrix} \mathbf{A}^T& \mathbf{I} \end{bmatrix} \begin{bmatrix} \mathbf{b} \\ \mathbf{y} \end{bmatrix} \right) \\ = & \left( \mathbf{A}^T \mathbf{A} + \mathbf{I} \right)^{-1} \left( \mathbf{A}^T \mathbf{b} + \mathbf{y} \right) \end{split} \end{equation}