I'm studying stuff about preconditioning and I understood that the idea behind is that, if I want to solve linear system
$$ A\textbf{x} = \textbf{b} $$
where $A$ is Symmetric Positive Definite $n \times n$ Matrix, I can pre-multiply it by $M^{-1}$ in such a way
$$(M^{-1}A)\textbf{x}=M^{-1}\textbf{b} $$
so that $k_2(M^{-1}A)\lt \lt k_2(A)$. Why this happens when $M^{-1}A \approx A^{-1}$?
Also, should I impose that my preconditioner is SPD too?
You have misunderstood two points. A good preconditioner is a matrix $M$ such that $M$ is nonsingular, $M^{-1}$ is a good approximation of $A^{-1}$ and such that any linear system $Mx=f$ can be solved rapidly. In this case, we have that $M^{-1}A \approx I$ is a good approximation. Now, when $A$ is symmetric positive definite we frequently search for a preconditioner $M$ which is symmetric positive definite and such that $L^{-1}AL^{-T}$ is a good approximation of the identity transformation, where $M = LL^T$ is the Cholesky factorization of $M$. If $$k_2(L^{-1}AL^{-T}) \ll k_2(A)$$ then the preconditioned conjugate gradient algorithm with preconditioner $M = LL^T$ will converge more rapidly than the conjugate gradient algorithm with no preconditioning. A good first choice for $M$ can sometimes be obtained by computing the incomplete Cholesky factorization of $A$.