Minimum Least Squares Solution Using Pseudo Inverse (Derived from SVD) Is The Minimum Norm Solution - Extension from Vectors to Matrices

1.9k Views Asked by At

Given $A \in \mathbb{R}^{m \times n}$, $B \in \mathbb{R}^{k \times \ell}$, and $C\in \mathbb{R}^{m \times \ell}$. Show that for $X \in \mathbb{R}^{n \times k}$

$$ {A}^{\dagger} C {B}^{\dagger} = \arg \min_{X} {\left\| C - A X B \right\|}_{F} $$

is the unique solution.

Note: This is an extension of the minimum $2$-norm least squares problem.

Hint: Use the SVDs of $A$ and $B$.

This is a homework problem, but I got stuck. After I do SVD for both $A$ and $B$, I discard the $0$ terms to get a slim SVD. Then I can factor out some things, but still don't see an explicit expression formulating. Could somebody guide me in the right direction?

Edit:
1."F norm": Frobenius norm
2. $A^+$ or $A^\dagger$: conjugate transpose, Moore–Penrose pseudoinverse

3

There are 3 best solutions below

0
On BEST ANSWER

First, I am not exactly sure what you are asking. I am assuming that you know that the minimal norm solution to the least squares problem $\min_x {1 \over 2} \|Ax-y\|^2$ is given by $x=A^{\dagger} y$ and that the issue to show that the solution to problem $\min_X {1 \over 2} \| C-AXB\|_F^2$ is given by $A^{\dagger} C B^{\dagger}$.

Second, note that $A^{\dagger} C B^{\dagger}$ is not necessarily a unique solution, however, of all solutions (and is at least one solution), it is the one of minimal norm. For example, if $A=0$ then any $X$ is a solution, but clearly not unique.

One approach is to determine the SVD with respect to the basis $E_{ij} = e_i e_j^T$ and translate this into the above form.

Note that if we can write $A = \sum_k \sigma_k u_k v_k^T$ (or equivalently, $Ax = \sum_k \sigma_k u_k \langle v_k, x \rangle$ for all $x$), where $\sigma_k \ge 0$, $u_k$ & $v_k$ are orthonormal, then we can extract the SVD from this formulation in the usual sort of way. Then $A^{\dagger} = \sum_{\sigma_k \neq 0} {1 \over \sigma_k } v_k u_k^T$ (or equivalently, $A^{\dagger} x = \sum_{\sigma_k \neq 0} {1 \over \sigma_k } v_k \langle u_k, x \rangle$ for all $x$).

So, we look for a similar expansion for the operator $L(X) = AXB$. \begin{eqnarray} AXB &=& (\sum_k \sigma_k(A) u_k(A)v_k(A)^T ) X (\sum_k \sigma_k(B) u_k(B)v_k(B)^T ) \\ &=& \sum_{k,l} \sigma_k(A) \sigma_l(B) u_k(A)v_k(A)^T X u_l(B)v_l(B)^T \\ &=& \sum_{k,l} \sigma_k(A) \sigma_l(B) u_k(A) v_l(B)^T v_k(A)^T X u_l(B) \\ &=& \sum_{k,l} \sigma_k(A) \sigma_l(B) u_k(A) v_l(B)^T \operatorname{tr} (v_k(A)^T X u_l(B)) \\ &=& \sum_{k,l} \sigma_k(A) \sigma_l(B) u_k(A) v_l(B)^T \operatorname{tr} ( u_l(B) v_k(A)^T X ) \\ &=& \sum_{k,l} \sigma_k(A) \sigma_l(B) u_k(A) v_l(B)^T \langle v_k(A) u_l(B)^T , X \rangle \\ \end{eqnarray} Hence $L$ has singular values $\sigma_k(A) \sigma_l(B) $ with left singular vectors $u_k(A) v_l(B)^T$ and right singular vectors $v_k(A) u_l(B)^T$. A quick check shows that the singular vectors are orthonormal.

Hence $L^{\dagger}$ has the representation \begin{eqnarray} L^{\dagger}(X) &=& \sum_{\sigma_k(A) \sigma_l(B) \neq 0} {1 \over \sigma_k(A) \sigma_l(B)} u_k(A) v_l(B)^T \langle v_k(A) u_l(B)^T , X \rangle \\ &=& \sum_{\sigma_k(A) \sigma_l(B) \neq 0} {1 \over \sigma_k(A) \sigma_l(B)} u_k(A) v_l(B)^T \operatorname{tr} ( u_l(B) v_k(A)^T X ) \\ &=& \sum_{\sigma_k(A) \sigma_l(B) \neq 0} {1 \over \sigma_k(A) \sigma_l(B)} u_k(A) v_l(B)^T v_k(A)^T X u_l(B) \\ &=& \sum_{\sigma_k(A) \sigma_l(B) \neq 0} {1 \over \sigma_k(A) \sigma_l(B)} u_k(A) v_k(A)^T X u_l(B) v_l(B)^T \\ &=& ( \sum_{\sigma_k(A) \neq 0} {1 \over \sigma_k(A)} u_k(A) v_k(A)^T ) X ( \sum_{\sigma_l(B) \neq 0} {1 \over \sigma_l(B)} u_l(B) v_l(B)^T ) \\ &=& A^{\dagger} X B^{\dagger} \end{eqnarray}

Hence the minimal norm solution to the least squares problem is given by $L^{\dagger}(C) = A^{\dagger} C B^{\dagger}$.

2
On

Deriving the Least Squares Solution

Since:

$$ \arg \min_{X} {\left\| C - A X B \right\|}_{F} = \arg \min_{X} \frac{1}{2} {\left\| C - A X B \right\|}_{F}^{2} $$

One could use the second term which is easier to work with.

Defining $ f \left( X \right) = \frac{1}{2} {\left\| C - A X B \right\|}_{F}^{2} $ then:

$$ \frac{d}{dX} f \left( X \right) = -{A}^{T} \left( C - A X B \right) {B}^{T} $$

Finding when the Derivative vanishes:

$$\begin{align*} \frac{d}{dX} f \left( X \right) = 0 & \iff -{A}^{T} \left( C - A X B \right) {B}^{T} = 0 \\ & \iff {A}^{T} A X B {B}^{T} = {A}^{T} C {B}^{T} \\ & \Rightarrow X = {\left( {A}^{T} A \right)}^{-1} {A}^{T} C {B}^{T} {\left( B {B}^{T} \right)}^{-1} = {A}^{\dagger} C {B}^{\dagger} \end{align*}$$

Deriving the Least Norm Solution

We're after the Least Norm solution of all Least Squares solutions:

$$\begin{align*} \hat{X} = \arg \min_{X} \: & \: {\left\| X \right\|}_{F} \\ \text{subject to} \: & \: X \in \left\{ Z \mid {\left\| A Z B - C \right\|}_{F} = \min_{Z} {\left\| A Z B - C \right\|}_{F} \right\} \end{align*}$$

The Singular Value Decomposition (SVD) of $ A = {U}_{A} {\Sigma}_{A} {V}_{A}^{T} $ and $ B = {U}_{B} {\Sigma}_{B} {V}_{B}^{T} $ and reminder that for any Unitary Matrix $ U $ the following holds $ {\left\| X \right\|}_{F} = {\left\| U X \right\|}_{F} $.

Then:

$$\begin{align*} \min_{Z} {\left\| A Z B - C \right\|}_{F} & = \min_{Z} {\left\| {U}_{A} {\Sigma}_{A} {V}_{A}^{T} Z {U}_{B} {\Sigma}_{B} {V}_{B}^{T} - C \right\|}_{F} \\ & = \min_{Z} {\left\| {\Sigma}_{A} {V}_{A}^{T} Z {U}_{B} {\Sigma}_{B} - {U}_{A}^{T} C {V}_{B} \right\|}_{F} \\ & = \min_{Y = {V}_{A}^{T} Z {U}_{B} } {\left\| {\Sigma}_{A} Y {\Sigma}_{B} - {U}_{A}^{T} C {V}_{B} \right\|}_{F} \end{align*}$$

Since both $ {\Sigma}_{A} $ and $ {\Sigma}_{B} $ are diagonal matrices the solution $ \hat{Y} $ is unique and is given by $ \hat{Y} = {{\Sigma}_{A}}^{\dagger} {U}_{A}^{T} C {V}_{B} {{\Sigma}_{B}}^{\dagger} $.

Since it is unique it is also the Least Norm solution.

This implies $ \hat{Z} = {V}_{A} {{\Sigma}_{A}}^{\dagger} {U}_{A}^{T} C {V}_{B} {{\Sigma}_{B}}^{\dagger} {U}_{B}^{T} = {A}^{\dagger} C {B}^{\dagger} $ is the least Norm Solution.

0
On

You don't need singular value decomposition here. Observe that $\pi_1:C\mapsto AA^+CB^+B$ and $\pi_2:X\mapsto A^+AXBB^+$ are orthogonal projections on their respective matrix spaces with respect to Frobenius inner products. Also, $$ (C-\pi_1(C))=(\operatorname{id}-\pi_1)(C)\perp \pi_1(AXB)=AXB=A\pi_2(X)B. $$ It follows that $$ \|C-AXB\|_F^2 = \|C-\pi_1(C)\|_F^2 + \|\pi_1(C)-A\pi_2(X)B\|_F^2 $$ and the problem boils down to minimising $\|\pi_1(C)-A\pi_2(X)B\|_F$. As this quantity involves only the projected component $\pi_2(X)$ of $X$ but not its orthogonal component, in a least-norm solution, we must have $X=\pi_2(X)$.

Clearly, when $X=A^+CB^+$, we have $X=\pi_2(X)$ and $\|\pi_1(C)-A\pi_2(X)B\|_F=0$. Therefore the minimum possible value (i.e. zero) of $\|\pi_1(C)-A\pi_2(X)B\|_F$ is attained and $A^+CB^+$ is indeed a least-norm minimiser.

If there is another least-norm minimiser $X$, we must have $X=\pi_2(X)$ and $\|\pi_1(C)-A\pi_2(X)B\|_F=0$. Thus we get \begin{align} A\pi_2(X)B &= \pi_1(C),\\ A^+A\pi_2(X)BB^+ &= A^+\pi_1(C)B^+,\\ \pi_2(\pi_2(X)) &= A^+AA^+CB^+BB^+,\\ X &= A^+CB^+ \end{align} and we conclude that the least-norm solution is unique.