There is a theorem that says that every matrix of rank $r$ can be transformed by means of a finite number of elementary row and column operations into the matrix $$D=\begin{pmatrix} I_r & O_1 \\ O_2 & O_3 \\ \end{pmatrix}$$ where $O_1, O_2, O_3$ are zero matrices and $I_r$ is the identity matrix of size $r$.
A corollary of this theorem says that for every matrix of rank $r$ there exist inevertible matrices $B$ and $C$ of size $m$x$m$ and of size $n$x$n$ respectively such that $D=BAC$
So every matrix can be transformed to a diagonal matrix $D$ and in this sense every matrix can be diagonalized; but the definition of a diagonalizable matrix is that: $A$ is diagonalizable if there exist an invertible matrix $P$ such that $P^{-1}AP$ is a diagonal matrix
This definition is very similar to the corollary of the theorem but is more restrictive,so I would really appreciate if you can tell me why do we adopt this restrictive definition of diagonalizable matrix
This is a property that could motivate the restriction: let $k\in \Bbb N$,
if $A=P^{-1}DP$, then $A^k = (P^{-1}DP)^{k} = P^{-1}D^kP$ with $(D^{k})_{i,i}=D_{i,i}^k$.
if $B \neq C^{-1}$ then $C^{-1}B^{-1}\neq I$, and so $A^k = (B^{-1}DC^{-1})^k=\ldots=(B^{-1}DC^{-1})^k$...