Define $$ \begin{align*} X &:= \begin{pmatrix} 5 & 0 & 2 & 0\\ 0 & 5 & 0 & 2\\ -7 & 0 & -3 & 0\\ 0 & 7 & 0 & 3 \end{pmatrix}\\ y &:= \begin{pmatrix} 1\\ 2\\ 3\\ 4 \end{pmatrix} \end{align*} $$
Consider the unique function $f:\mathbb{R}^4\rightarrow\mathbb{R}$ satisfying $f(\beta) = \|y - X\beta\|^2$ for every $\beta \in \mathbb{R}^4$ ($\|\cdots\|$ being the Euclidean norm).
It can be shown that $f$ has a global minimum that is attained at $\hat{\beta} = (X^TX)^{-1}X^Ty$ and only there.
What happens to the minimizing vector and to the minimum value, if we replace $f$ by the unique function $p:\mathbb{R}^4\rightarrow\mathbb{R}$ satisfying $p(\beta) = f(\beta)+\|\beta\|^2$ for every $\beta\in\mathbb{R}^4$?
We can easily find a derivative of $p$: $$ \nabla p(\beta) = 2(-X)^\top (y-X \beta) + 2\beta = -2X^\top y + (2\mathrm{Id}+2X^\top X)\beta \overset{!}{=} 0 \iff \boldsymbol{\beta} = (\mathrm{Id}+ X^\top X)^{-1}(X^\top y) $$
To verify that this is indeed a minimizer, we observe that the hessian $Hp(\beta) = 2(\mathrm{Id} + X^\top X)$ is positive definite.
So the identity matrix squeezes in to account for the penalty $\lVert \beta \rVert^2$.
Maybe you can also use $QR$-decompostion to prove this...