Section on Moore-Penrose Pseudoinverse in the Deep Learning book

562 Views Asked by At

I am reading the following section in the Deep Learning book by Goodfellow, Bengio and Courville.

enter image description here

I have some questions.

If $A\in\mathbb{R}^{m\times n}$ and $Ax=y$, then we can have left inverse $B$ such that $x=By$ only when the nullspace of the matrix $A$ is $\{0\}$, because if $Ax=0=y$ for some $x\neq 0$ then there is no $B$ such that $0\neq x=By=0$. In other words, the dimension of rowspace, and hence the rank of $A$, must be $n$.

Thus, if $A$ is taller than it is wide, $m>n$, then it is possible for the left-inverse to exist. On the other hand, if $A$ is wider than it is tall, $m<n$, then the rank can be at most $m$, which means that $A$ cannot have left-inverse.

However, in the screen-shot I have attached, it shows the exact opposite: If $A$ is taller than it is wide, then it is possible for this equation to have no solution. If $A$ is wider than it is tall, then there could be multiple possible solutions.

Where I have gone wrong in my reasoning?

2

There are 2 best solutions below

0
On BEST ANSWER

Let us suppose that $A$ is taller than wide. Then, yes, it may have a left inverse. And the equation $Ax=y$ may have no solutions. You are acting as if there is a contradiction here, but where is it? If $A$ is the null matrix, then $A$ has no left inverse and, if $y\neq0$, the equation $Ax=y$ has no solutions. On the other hand, if $A$ has a single column and $A\neq0$, then $A$ has a left inverse (infinitely many, in fact) and the equation $Ax=y$ always has a solution (again, infinitely many, in fact).

0
On

I like simple examples to build intuition:

  • Let $A$ be the fat matrix $A = \begin{bmatrix} a_1 & a_2 \end{bmatrix}$. This gives you an underdetermined system in $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ $$A \mathbf{x} = \mathbf{b} \quad \Leftrightarrow \quad \begin{bmatrix}a_1 x_1 + a_2 x_2 \end{bmatrix} = \begin{bmatrix} b_1 \end{bmatrix}$$ $m = 1$, $n=2$, $\operatorname{rank}(A) \leq 1$, and $\operatorname{rank}(A) < n$. A continuum of solutions exists.

  • Let $A$ be the skinny matrix $A = \begin{bmatrix} a_1 \\ a_2 \end{bmatrix}$. This gives you an overdetermined system in $\mathbf{x} = \begin{bmatrix} x_1 \end{bmatrix}$ $$A \mathbf{x} = \mathbf{b} \quad \Leftrightarrow \quad \begin{bmatrix}a_1 x_1 \\ a_2 x_1 \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \end{bmatrix}$$ $m = 2$, $n=1$. This may or may not have a solution depending if $\mathbf{b}$ lies in the image of $A$.

    • Let's assume $A$ is skinny $m > n$ with rank $n$. Let $A^\dagger = (A^*A)^{-1}A^*$. Then $A^\dagger \mathbf{b} = (A^*A)^{-1}A^*A\mathbf{x} = \mathbf{x}$. Pseudo-inverse $A^\dagger$ will give you back $\mathbf{x}$ if the solution $A\mathbf{x}=\mathbf{b}$ has a solution. (Note: if $A\mathbf{x} = \mathbf{b}$ doesn't have a true solution, it gives you back the solution in the least squares sense, that is, the $\mathbf{x}$ that minimizes squared error $(A\mathbf{x} - \mathbf{b})'(A\mathbf{x} - \mathbf{b})$.)

As @José Carlos Santos describes, there's no contradiction.