Given a matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$ and vectors $\mathbf{b}\in\mathbb{R}^m$ and $\mathbf{y}\in\mathbb{R}^n$, I need to find the vector $\mathbf{x_*}\in\mathbb{R}^n$ such that
$$\mathbf{x_*} = \text{arg}\min_{\mathbf{x}\in\mathbb{R}^n}{||\mathbf{x-y}||^2+||\mathbf{Ax}-\mathbf{b}||^2}$$
If $\mathbf{y}$ satisfies that $\mathbf{Ay=b}$, then $\mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $\mathbf{Ay\neq b}$.
It's easy to check that $||\mathbf{x-y}||^2+||\mathbf{Ax}-\mathbf{b}||^2 = \left\Vert\begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}\mathbf{x}-\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} \right\Vert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $\mathbf{x}_* = \begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}^\dagger\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix}\hspace{1em}$where $^\dagger$ denotes de psuedo-inverse of a matrix.
I found on internet that an alternative solution, is to find the vector $\mathbf{x}$ such that the gradient of $||\mathbf{x-y}||^2+||\mathbf{Ax}-\mathbf{b}||^2$ is null. Then, I yields to the followin system
$$\mathbf{(I+A^TA)x=y+A^Tb}$$
My question is, why
$$\mathbf{x}_* = \begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}^\dagger\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} = \mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$
[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.
Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $\begin{bmatrix}\mathbf{A} \\\mathbf{I}\end{bmatrix}$ can be expressed as $\left(\begin{bmatrix}\mathbf{A} \\ \mathbf{I}\end{bmatrix}^T\begin{bmatrix}\mathbf{A} \\ \mathbf{I}\end{bmatrix}\right)^{-1}\begin{bmatrix}\mathbf{A} \\ \mathbf{I}\end{bmatrix}^T$, if the inverse exists. This expression simplifies to $\left(\mathbf{A^TA + I}\right)^{-1}\begin{bmatrix}\mathbf{A} \\\mathbf{I}\end{bmatrix}^T$. Therefore, $$\mathbf{x}_* = \begin{bmatrix}\mathbf{A}\\\mathbf{I}\end{bmatrix}^\dagger\begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} = \left(\mathbf{A^TA + I}\right)^{-1}\begin{bmatrix}\mathbf{A} \\\mathbf{I}\end{bmatrix}^T \begin{bmatrix}\mathbf{b}\\\mathbf{y}\end{bmatrix} = \mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$