Calculating the Moore-Penrose pseudoinverse

11.2k Views Asked by At

I have a problem with a project requiring me to calculate the Moore-Penrose pseudo inverse. I've also posted about this on StackOverflow, where you can see my progress.

From what I understand from Planet Math you can simply compute the pseudoinverse only the first formula which I can understand, but it also says that this is for general cases, and you have to do SVD (singular value decomposition) and the formula becomes much complicated (the second formula) which I don't understand... I mean,

  1. What is V? What is S? I'm confused.
  2. How can I calculate SVD?
  3. Can you please help me building the code/algorithm, or just a simple advice?
  4. Why there are two pseudo inverse formulas?

Left pseudo inverse formula $$A_\text{left} = (A^TA)^{-1}A^T$$ Right pseudo inverse formula $$A_\text{right}=A^T(AA^T)^{-1}$$

Thank you very much, Daniel.

2

There are 2 best solutions below

5
On

While the SVD yields a "clean" way to construct the pseudoinverse, it is sometimes an "overkill" in terms of efficiency.

The Moore-Penrose pseudoinverse can be seen as follows: Let $\ell:\mathbb R^n\rightarrow\mathbb R^m$ be a linear map. Then $\ell$ induces an isomorphism $\ell':{\rm Ker}(\ell)^\perp\rightarrow {\rm Im}(\ell)$. Then the Moore-Penrose pseudoinverse $\ell^+:\mathbb R^m\rightarrow \mathbb R^n$ can be described as follows.

$$\ell^+(x)=\ell'^{-1}(\Pi(x)),$$ where $\Pi$ is the orthogonal projection of $x$ on ${\rm Im}(\ell)$.

In other words, what you need is to compute orthonormal bases of ${\rm Im}(\ell)$ and of ${\rm Ker}(\ell)^\perp$ to contruct the Moore-Penrose pseudoinverse.

For an algorithm, you may be interested by the iterative method here

edit: roughly speaking, one way to see why the SVD might be an "overkill" is that if $A$ is a matrix with rational coefficients, then $A^+$ also have rational coefficients (see e.g. this paper), while the entries of the SVD are algebraic numbers.

14
On

These formulas are for different matrix formats of the rectangular matrix $A$.

The matrix to be (pseudo-)inverted should have full rank. (added:) If $A\in I\!\!R^{m\times n}$ is a tall matrix, $m>n$, then this means $rank(A)=n$, that is, the columns have to be linearly independent, or $A$ as a linear map has to be injective. If $A$ is a wide matrix, $m<n$, then the rows of the matrix have to be independent to give full rank. (/edit)

If full rank is a given, then you are better off simplifying these formulas using a QR decomposition for $A$ resp. $A^T$. There the R factor is square and $Q$ is a narrow tall matrix with the same format as $A$ or $A^T$,

If $A$ is tall, then $A=QR$ and $A^{\oplus}_{left}=R^{-1}Q^T$

If $A$ is wide, then $A^T=QR$, $A=R^TQ^T$, and $A^{\oplus}_{right}=QR^{-T}$.


You only need an SVD if $A$ is suspected to not have the maximal rank for its format. Then a reliable rank estimation is only possible comparing the magnitudes of the singular values of $A$. The difference is $A^{\oplus}$ having a very large number or a zero as a singular value where $A$ has a very small singular value.


Added, since wikipedia is curiosly silent about this: Numerically, you first compute or let a library compute the SVD $A=U\Sigma V^T$ where $Σ=diag(σ_1,σ_2,\dots,σ_r)$ is the diagonal matrix of singular values, ordered in decreasing size $σ_1\ge σ_2\ge\dots\ge σ_r$.

Then you estimate the effective rank by looking for the smallest $k$ with for instance $σ_{k+1}<10^{-8}σ_1$ or as another strategy, $σ_{k+1}<10^{-2}σ_k$, or a combination of both. The factors defining what is "small enough" are a matter of taste and experience.

With this estimated effective rank $k$ you compute $$Σ^⊕=diag(σ_1^{-1},σ_2^{-1},\dots,σ_k^{-1},0,\dots,0)$$ and $$A^⊕=VΣ^⊕U^T.$$

Note how the singular values in $Σ^⊕$ and thus $A^⊕$ are increasing in this form, that is, truncating at the effective rank is a very sensitive operation, differences in this estimation lead to wildly varying results for the pseudo-inverse.