This question bothered me for a while and hopefully someone can shed some light on the issue.
A matrix $A$ is said to be diagonalizable if there is an invertible matrix $P$ and a diagonal matrix $D$ such that $A=P^{-1}DP$. That is the definition of a matrix being diagonalizable.
Now, a linear transformation is said to be diagonalizable if there exists a basis $C$ of eigenvectors.
Why is that the same thing? Let's say that $A$ is diagonalizable, I define a linear mapping: \begin{gather*} T\colon V \to V \\ T(v)=Av \end{gather*}
Why does this say that $T$ is diagonalizable? Why does this mean that there is a basis of eigenvectors?
And also the other way around: let's say that the transformation $T$ is diagonalizable. Why does this mean that $A$ is diagonalizable?
A note understanding what coordinates are and how they change when you change the basis with respect to which you measure them. Diagonalization is about obtaining a basis in which, when you compute coordinates, the matrix of the linear transformation turns out to be diagonal.
Introduction
Let $U:=\{u_1,u_2,\ldots,u_n\}$ be a basis of $\mathbb{R}^n$. Notice that we have enumerated the elements of this basis, fixing this way an ordering. The computations done below depend on this ordering.
For a vector $x\in\mathbb{R}^n$ we can compute scalars $\alpha_1,\alpha_2,\ldots,\alpha_n$ such that
$$x=\alpha_1u_1+\alpha_2u_2+\ldots+\alpha_n u_n.$$
These scalars are unique, since $U$ is a basis. Therefore we can associate to each vector its coordinates in the (ordered) basis $U$. We say that $x$ is represented by the column of coordinates $$\begin{bmatrix}\alpha_1\\\alpha_2\\\vdots\\\alpha_n\end{bmatrix}_U$$
This we call: column of coordinates, coordinates, or simply vectors of coordinates, of the vector $x$ in the basis $U$.
We put the subscript $U$ to remember that these are coordinates in the basis $U$.
Notice that we have been denoting vectors (of $\mathbb{R}^n$) using columns of numbers as well
$$\begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix}.$$
These columns of numbers happen to be the coordinates of these vectors in the standard basis. If we call the standard basis $E:=\{e_1,e_2,\ldots,e_n\}$, then we have that the column of coordinates of $x$ is
$$\begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix}_{E}.$$
Although, being the same column of numbers, we shouldn't think of a vector of $\mathbb{R}^n$ and its coordinates in the standard basis as the same thing. Keeping this in mind will be helpful to work later with abstract vector spaces (vector spaces that are not $\mathbb{R}^n$).
It is good to think of columns of coordinates as a code. Fixing a basis with an ordering we associate to each vector this code that completely determines the vector. In the standard basis, the vectors of $\mathbb{R}^n$ just happen to look the same as their coding in the standard basis (in other bases they won't necessarily look the same).
In this note we will see how the coding (the coordinates) change when the basis is changed.
Passing from coordinates in one basis to coordinates in another basis
Assume that we are given a new basis $V:=\{v_1,v_2,\ldots,v_n\}$ of $\mathbb{R}^n$. So, we have the bases $U$, $V$ and the coordinates $$\begin{bmatrix}\alpha_1\\\alpha_2\\\vdots\\\alpha_n\end{bmatrix}_U$$ of a vector in the basis $U$.
We can compute the coordinates $$\begin{bmatrix}\beta_1\\\beta_2\\\vdots\\\beta_n\end{bmatrix}_V$$ of this vector in the basis $V$ by solving the linear system of equations:
$$\beta_1v_1+\beta_2v_2+\ldots+\beta_n v_n=\alpha_1u_1+\alpha_2u_2+\ldots+\alpha_nu_n.$$
Computing the matrix of change of coordinates
In principle we could solve the equation above for each vector of coordinates we need to transform. But the way coordinates transform is linear and therefore there is a matrix that allows us to compute the change of coordinates by matrix multiplication, i.e. there is a matrix $P$, depending on the bases $U$ and $V$ such that if $X_U$ is the column of coordinates of a vector $x\in\mathbb{R}^n$ in the basis $U$, and $Y_V$ is the column of coordinates in the basis $V$, then $Y_V=PX_U$.
The matrix $P$, which depends also on the way we order the bases $U$ and $V$ is computed in the following way:
Step 1: Compute, for each $u_j$, its coordinates in the basis $V$. This is, solve, for each $j=1,2,\ldots,n$, the equation
$p_{1j}v_1+p_{2,j}v_2+\ldots+p_{n,j}v_n=u_j,$
where $p_{i,j}$ are the unknowns.
Step 2: Put the coefficients $p_{i,j}$ obtained as columns of a matrix (in the order in which they appear !!!! for $j=1,2,\ldots,n$). This is the matrix $P$, which we call the matrix of change of coordinates.
For the next section let us denote the matrix constructed above as $P_{VU}$ to emphasize that if you multiply from the right by coordinates in the basis $U$ it will return coordinates in the basis $V$, i.e. $X_V=P_{VU}X_U$, where $X_U$ and $X_V$ are the columns of coordinates of $x$ in the bases $U$ and $V$, respectively.
Observation: If we have the matrix $P_{VU}$ that changes coordinates from the basis $U$ to coordinates in the basis $V$ then the matrix that changes coordinates from the basis $V$ to coordinates in the basis $U$ is the inverse of $P_{VU}$, i.e. $P_{UV}=P_{VU}^{-1}$.
Computing the matrix of a linear transformation in one basis having the matrix in another basis
Suppose we are given a linear transformation $T:\mathbb{R}^n\rightarrow\mathbb{R}^n$. We have seen that there is a matrix $A$ such that for every vector $x\in\mathbb{R}^n$ we get $T(x)=Ax$.
Since we have seen that the column of numbers representing the vector $x$ are the same as the column of coordinates $X_E$ in the standard basis $E$, we can also write $T(x)_B=AX_B$, where $T(x)_E$ is the column of coordinates of $T(x)$ in the basis $E$ (keep in mind that $T(x)$ and $T(x)_E$ actually look the same since $E$ is the standard basis).
Assume that we have a basis $V$ and we compute the matrix of change of coordinates $P_{EV}$ from coordinates in the basis $V$ to coordinates in the basis $V$. Denote by $X_V$ the column of coordinates of $x$ in the basis $V$. Then $X_E=P_{EV}X_V$. We also have that $T(x)_E=P_{EV}T(x)_{V}$.
Therefore, substituting these in in the formula $T(x)_E=AX_E$ we get $P_{EV}T(x)_V=AP_{EV}X_V$, from where
$$T(x)_V=P_{EV}^{-1}AP_{EV}X_V.$$
So, if $A$ was the matrix that computes $T$ in the coordinates in the basis $E$, then $P_{EV}^{-1}AP_{EV}$ is the matrix computing $T$ in coordinates in the basis $V$. Once the matrices $P_{EV}$ and $P_{EV}^{-1}$ are computed as in the previous section, we get the new matrix of $T$ in the basis $V$ using the previous formula.
Observation: The formula above shows why the relation of similarity ($A$ similar to $P^{-1}AP$) is so important. Similar matrices can be used to represent the same linear transformation in different bases.
Example:
Suppose we have, in $\mathbb{R}^2$ the standard basis $E:=\{e_1=\begin{bmatrix}1\\0\end{bmatrix},e_2=\begin{bmatrix}0\\1\end{bmatrix}\}$ (in this order). Consider the basis $V:=\{v_1=\begin{bmatrix}1\\1\end{bmatrix},v_2=\begin{bmatrix}1\\-1\end{bmatrix}\}$ (in this order).
We can compute that $$\frac{1}{2}v_1+\frac{1}{2}v_2=e_1$$ and that $$\frac{1}{2}v_1-\frac{1}{2}v_2=e_2.$$
Putting these coefficients as columns (the coefficients of the first equation in the first column, the coefficients of the second equation in the second column. The order is very important) we get the matrix
$$P_{VE}=\begin{bmatrix}\frac{1}{2}&\frac{1}{2}\\\frac{1}{2}&-\frac{1}{2}\end{bmatrix}.$$
This is the matrix of change of coordinates from coordinates in the basis $E$ to coordinates in the basis $V$.
Assume now that $T:\mathbb{R}^2\rightarrow\mathbb{R}^2$ is a linear transformation given by $T\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}\right)=\begin{bmatrix}2&1\\1&2\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}$
As we have seen, this formula can be thought as written for coordinates in the standard basis $E$, i.e.
$$T\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}_E\right)=\begin{bmatrix}2&1\\1&2\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}_E$$
because vectors in $\mathbb{R}^2$ and columns of their coordinates in the standard basis look the same.
Let us compute the formula for $T$ if we were using coordinates in the basis $V$.
We know that if $X_V$ is the column of coordinates of $x$ in the basis $V$ then $T(X_V)_V=P_{EV}^{-1}AP_{EV}X_V$, where $A=\begin{bmatrix}2&1\\1&2\end{bmatrix}$.
We have already computed
$P_{VE}$, so $P_{EV}=P_{VE}^{-1}$ is its inverse, we get
$$P_{EV}=\begin{bmatrix}1&1\\1&-1\end{bmatrix}.$$
Therefore we get
$$P_{EV}^{-1}AP_{EV}=P_{VE}AP_{EV}=\begin{bmatrix}3&0\\0&1\end{bmatrix}.$$
Notice how the matrix to compute $T$ using coordinates in the basis $V$ is much simpler than the matrix to compute $T$ using coordinates in the standard basis. This is the main reason to change basis, to change coordinates, to change variables, to change system of reference; we could get simpler formulas.