If I have an analytic function $f$ of a square matrix A (like sin(A)), then I know that if the matrix diagnosable then it is possible to find a matrix $$D = P^{-1}AP \tag{1}$$. Then for a function $f(A)$: $$f(A) = Pf(D)P^{-1} \tag{2}$$
So for example if $f(x) = cos(x)$ : $$cos(A) = Pf(D)P^{-1} \tag{3}$$
What is the justification for this and how does this follow from first principles? A similar question has been asked here $\sin(A)$, where $A$ is a matrix but I don't see any justification/proof for (2).
If $f(A)$ is defined for every $A$ then $f$ must be an entire function: $$f(z)=\sum_{n=0}^\infty c_n z^n.$$Note that $$(PDP^{-1})^n=PD^nP^{-1}.$$