Why does putting the eigenvectors as columns in a matrix give us the diagonalizing matrix?

2.6k Views Asked by At

For $A$ of $n \times n$ if we have $n$ eigenvectors, we can put them as columns in a matrix and get the diagonalizing matrix - why does it work?

5

There are 5 best solutions below

0
On

That is because, putting the coordinates of the eigenvectors as the columns of a matrix $P$, you obtain the change of basis matrix from the initial basis $\mathcal B$ to the basis of eigenvectors $\mathcal B'$, which means that if a vector has coordinates $X$ in basis $\mathcal B$, $X'$ in basis $\mathcal B'$, we have the relation $$X=PX'.$$

Now, the linear map associated to the matrix $A$ in the initial basis is described, in terms of coordinates, by $\;Y=AX$, which becomes, in terms of the new coordinates, $$PY'=A(PX')=(AP)X'\iff Y'= (P^{-1}AP)X'.$$ Thus the matrix of this linear map in the basis of eigenvectors is the matrix $\;A'=P^{-1}AP$, and this matrix, by definition of eigenvectors, is a diagonal matrix.

6
On

If I understand your question correctly, this comes directly from the definition of eigenvectors and eigenvalues, just arranged in a matrix.

Recall that if $Av=\lambda v$ then $v$ is the eigenvector and $\lambda$ is the eigenvalue. Now put several (orthogonal) eigenvectors in a matrix $V$ and you get $AV=VD$ where $D$ is a diagonal matrix with the respective eigenvalues on the diagonal. This is not a rigorous explanation, just an intuitive explanation of why it makes sense.

0
On

First of all the property of having eigenvectors at all depends on the ring your matrix coefficients are taken from.

For example the matrix $\begin{bmatrix}0&-1\\1&0\end{bmatrix}$ has no eigenvectors if interpreted as matrix in $\mathbb{R}^{2\times 2}$, but has the eigenvectors $\begin{pmatrix} i \\ 1\end{pmatrix}$ and $\begin{pmatrix} -i \\ 1 \end{pmatrix}$ for the eigenvalues $i$ and $-i$ respectively, when considered as matrix in $\mathbb{C}^{2\times 2}$.

One says that given a ring $R$ a matrix $A \in R^{n \times n}$ is diagonizable, if $R^n$ admits a basis consisting of eigenvalues (you may replace $R$ with your favourite field to obtain a finite dimensional vector space). The intuition why writing the eigenvectors into a matrix $S$ gives you $A=S \cdot \operatorname{diag}(\lambda_1,...,\lambda_n)\cdot S^{-1}$ is given by the following observations:

  1. By definition an eigenvector is a vector $v$, which satisfies $Av = \lambda v$ which means that the matrix $A$ acts on this vector just by stretching it a bit.

  2. Suppose we have a basis $\mathcal B=\{b_1,...,b_n\}$ of $R^n$, with all $b_i$ given with respect to the standard basis $\{e_1,...,e_n\}$. This means we can write an arbitrary vector $v$ in $R^n$ as a unique linear combination $v=r_1b_1 + ... + r_nb_n$. The coefficients $r_i$ form a coordinate vector $(r_1,...,r_n)_\mathcal{B}$ with respect to the basis $\mathcal{B}$. Now note that writing the basis vectors into a matrix $B = (b_1 \mid ... \mid b_n)$ and multiplying with the coordinate vector $(r_1,...r_n)_\mathcal{B}$ gives you back the linear combination resulting in $v$. With other words, the matrix $B$ gives you a change of coordinates with respect to $\mathcal B$ to standard coordinates. I find it intuitive that the inverse matrix $B^{-1}$ gives you a change of coordinates the other way around.

  3. Putting 1 and 2 together we find that we can describe the action of $A$ on a vector $v$ by first changing coordinates to the basis consisting of eigenvectors (via $S^{-1}$), since there the action is just given by multiplication with the corresponding eigenvalues (via $\operatorname{diag}(\lambda_1,...,\lambda_n)$) and then changing back to the original coordinates (via $S$)

1
On

Let $v_1,v_2,..,v_n$ be any eigenvectors of $A$. Then we can write for each $1 \leq j \leq n$ $$Av_j =\lambda_j v_j$$ This implies that $$A \begin{bmatrix} v_1 & v_2 & ... & v_n \end{bmatrix}=\begin{bmatrix} \lambda_1 v_1 & \lambda_2 v_2 & ... & \lambda_n v_n \end{bmatrix}$$

Now,remembering how multiplication with diagonal matrices works, it is easy to see that $$\begin{bmatrix} \lambda_1 v_1 & \lambda_2 v_2 & ... & \lambda_n v_n \end{bmatrix}=\begin{bmatrix} v_1 & v_2 & ... & v_n \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 &...& 0 \\ 0& \lambda_2 & ...& 0\\ ...&...&...&...\\ 0&0& ...& \lambda_n \end{bmatrix}$$

In conclusion, you get the following more general phenomenon:

If $A$ is any matrix, $P$ is any matrix whose columns are eigenvectors of $A$ and $D$ is the diagonal matrix consisting of the corresponding eigenvalues, then $$AP=PD$$

Now, we would like to move $P$ on the other side. This is only possible when $P$ is invertible, or equivalently the columns of $P$ are linearly independent.

Therefore, if $A$ has $n$ linearly independent eigenvectors(which is exactly the diagonalisation condition), we can put them in $P$ and then $P$ becomes invertible. In that case $$AP=PD \Rightarrow A=PDP^{-1}$$

0
On

Follows, I think, a pretty straightforward and simple way to see this:

First, let $B$ be any $n \times m$ matrix with columns $B_1, B_2, \ldots, B_m$, so that we may write $B$ in columnar form as

$B = [B_1 \; B_2 \; \ldots \; B_m]; \tag 1$

then is it easy to see that

$AB = [AB_1 \; AB_2 \; \ldots \; AB_m]; \tag 2$

now if $E_1, E_2, \ldots, E_n$ are linearly independent eigenvectors of $A$, that is, if we have

$AE_i = \mu_i E_i, \tag 3$

for the scalar eigenvalues $\mu_i$, $1 \le i \le n$, then the linear independence of the $E_i$ implies that the $n \times n$ matrix

$E = [E_1 \; E_2 \; \ldots \; E_n] \tag 4$

is invertible; that is, there exists an $n \times n$ matrix $E^{-1}$ such that

$E^{-1}E = E^{-1}[E_1 \; E_2 \; \ldots E_n] = I; \tag 5$

since

$E^{-1}[E_1 \; E_2 \; \ldots E_n] = [E^{-1}E_1 \; E^{-1}E_2 \; \ldots E^{-1}E_n], \tag 6$

we may infer from (5) and (6) that

$E E_i = \mathbf e_i = \begin{pmatrix} \delta_{i1} \end{pmatrix}, \tag 7$

i.e., $EE_i = \mathbf e_i$ is the column vector whose $i$-th row is $1$ with all other rows (entries) equal to $0$. Now in accord with (2) and (3) we also have

$AE = A [E_1 \; E_2 \; \ldots \; E_n]$ $= [AE_1 \; AE_2 \; \ldots \; AE_n] = [\mu_1 E_1 \; \mu_2 E_2 \; \ldots \; \mu_n E_n], \tag 8$

whence

$E^{-1}AE = E^{-1}[\mu_1 E_1 \; \mu_2 E_2 \; \ldots \; \mu_n E_n]$ $= [\mu_1 E^{-1}E_1 \; \mu_2 E^{-1}E_2 \; \ldots \; \mu_n E^{-1}E_n]$ $= [\mu_1 \mathbf e_1 \; \mu_2 \mathbf e_2 \ldots, \mu_n \mathbf e_n] = \text{diag}(\mu_1, \mu_2, \ldots, \mu_n), \tag 9$

the $n \times n$ diagonal matrix which has the eigenvalues $\mu_i$ of $A$ along its main diagonal an zeroes elsewhere. We thus see that $E$ is the diagonalizing matrix for $A$.

Now as our OP Alon asked, "why does it work?" Scrutiny of the above discussion reveals two essential factors which contribute to the success of this program: first, left multiplication of the eigenvector matrix $E$ by $A$ multiplies each column separately, thus in accord with the eigen-equation (3) each column $E_i$ is merely multiplied by the corresponding scalar $\mu_i$; and second, multiplication of such columns by $E^{_1}$ returns the corresponding column of the identity matrix $I$ multiplied by the corresponding $\mu_i$; in this way the matrix $\text{diag}(\mu_1, \mu_2, \ldots, \mu_n)$ is the net result of these operations acting in concert; and thus $A$ is diagonalized by the eigenvector matrix $E$.

Note Added in Edit; Sunday 19 January 2020 9:56 PM PST: The attentive reader may have observed that, while the assumption that $A$ has $n$ eigenvectors is explicit in the text of the problem itself, no mention is there made of their linear independence; but that in fact I have introduced this concept ca. (2)-(4) in my answer; in this sense I have added an assumption to the question as stated by our OP Alon. In fact, this assumption is essential if $E = [E_1 \; E_2 \; \ldots \; E_n]$ is to be a diagonalizing matrix for $A$, since it is equivalent to the existence of $E^{-1}$. Finally, we note that without the hypothesis of linear independence, it is possible for $A$ to have and infinite number of eigenvectors; indeed, taking $A = aI$, for any vector $V$ we find $AV = aIV = aV$; thus every $V$ is an eigenvector corresponding to $a$; however, the number of linearly independent eigevectors of $A$ is $n = \text{size}(A)$. End of Note.