In the book Convex Optimization by Stephen Boyd on page 649, the pseudo-inverse is defined as:
$A^{\dagger}=V\Sigma^{-1}U^T$
this is the SVD decomposition.
After that it says alternative forms are:
$A^{\dagger}=\lim_{\epsilon \to 0}(A^TA+\epsilon I)^{-1}A^T = \lim_{\epsilon \to 0}A^T(A^TA+\epsilon I)^{-1}$
How are these two equations obtained? Also why is the inverse equal to: $A^{\dagger} = A^T(A^TA)^{-1}$ when the system is under-determined that is the rank is the number of rows and columns > rows.
Let $Ax=b$, $A\in\mathbb{R}^{m\times n}$. If A is over-determined, so that the number of linearly independent rows is equal to the number of columns then the $\textit{normal equations}$ read $A^TAx=A^Tb$. $A^TA$ is invertible by the rank condition on $A$, thus $x=(A^TA)^{-1}A^Tb$. This motivates the choice to call $A^\dagger$ a pseudo-inverse of $A$.
Now to show that the above formula agrees with the one given in Boyd, let $A=U\Sigma V^{T}$ be the SVD of $A$. Then $A^T=V\Sigma U^T$ and $(A^{T}A+\epsilon I)^{-1}=(V^T(\Sigma+\epsilon I) V)^{-1}$. Multiplying and taking the limit of the resulting expression product results in the above formula.
The formula in the second question is incorrect (check Boyd, it should read $A^{\dagger} = A^T(AA^T)^{-1}$).