Relations between matrices and linear transformations, diagonalization.

7.2k Views Asked by At

This question bothered me for a while and hopefully someone can shed some light on the issue.

A matrix $A$ is said to be diagonalizable if there is an invertible matrix $P$ and a diagonal matrix $D$ such that $A=P^{-1}DP$. That is the definition of a matrix being diagonalizable.

Now, a linear transformation is said to be diagonalizable if there exists a basis $C$ of eigenvectors.

Why is that the same thing? Let's say that $A$ is diagonalizable, I define a linear mapping: \begin{gather*} T\colon V \to V \\ T(v)=Av \end{gather*}

Why does this say that $T$ is diagonalizable? Why does this mean that there is a basis of eigenvectors?

And also the other way around: let's say that the transformation $T$ is diagonalizable. Why does this mean that $A$ is diagonalizable?

4

There are 4 best solutions below

0
On BEST ANSWER

A note understanding what coordinates are and how they change when you change the basis with respect to which you measure them. Diagonalization is about obtaining a basis in which, when you compute coordinates, the matrix of the linear transformation turns out to be diagonal.

Introduction

Let $U:=\{u_1,u_2,\ldots,u_n\}$ be a basis of $\mathbb{R}^n$. Notice that we have enumerated the elements of this basis, fixing this way an ordering. The computations done below depend on this ordering.

For a vector $x\in\mathbb{R}^n$ we can compute scalars $\alpha_1,\alpha_2,\ldots,\alpha_n$ such that

$$x=\alpha_1u_1+\alpha_2u_2+\ldots+\alpha_n u_n.$$

These scalars are unique, since $U$ is a basis. Therefore we can associate to each vector its coordinates in the (ordered) basis $U$. We say that $x$ is represented by the column of coordinates $$\begin{bmatrix}\alpha_1\\\alpha_2\\\vdots\\\alpha_n\end{bmatrix}_U$$

This we call: column of coordinates, coordinates, or simply vectors of coordinates, of the vector $x$ in the basis $U$.

We put the subscript $U$ to remember that these are coordinates in the basis $U$.

Notice that we have been denoting vectors (of $\mathbb{R}^n$) using columns of numbers as well

$$\begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix}.$$

These columns of numbers happen to be the coordinates of these vectors in the standard basis. If we call the standard basis $E:=\{e_1,e_2,\ldots,e_n\}$, then we have that the column of coordinates of $x$ is

$$\begin{bmatrix}x_1\\x_2\\\vdots\\x_n\end{bmatrix}_{E}.$$

Although, being the same column of numbers, we shouldn't think of a vector of $\mathbb{R}^n$ and its coordinates in the standard basis as the same thing. Keeping this in mind will be helpful to work later with abstract vector spaces (vector spaces that are not $\mathbb{R}^n$).

It is good to think of columns of coordinates as a code. Fixing a basis with an ordering we associate to each vector this code that completely determines the vector. In the standard basis, the vectors of $\mathbb{R}^n$ just happen to look the same as their coding in the standard basis (in other bases they won't necessarily look the same).

In this note we will see how the coding (the coordinates) change when the basis is changed.

Passing from coordinates in one basis to coordinates in another basis

Assume that we are given a new basis $V:=\{v_1,v_2,\ldots,v_n\}$ of $\mathbb{R}^n$. So, we have the bases $U$, $V$ and the coordinates $$\begin{bmatrix}\alpha_1\\\alpha_2\\\vdots\\\alpha_n\end{bmatrix}_U$$ of a vector in the basis $U$.

We can compute the coordinates $$\begin{bmatrix}\beta_1\\\beta_2\\\vdots\\\beta_n\end{bmatrix}_V$$ of this vector in the basis $V$ by solving the linear system of equations:

$$\beta_1v_1+\beta_2v_2+\ldots+\beta_n v_n=\alpha_1u_1+\alpha_2u_2+\ldots+\alpha_nu_n.$$

Computing the matrix of change of coordinates

In principle we could solve the equation above for each vector of coordinates we need to transform. But the way coordinates transform is linear and therefore there is a matrix that allows us to compute the change of coordinates by matrix multiplication, i.e. there is a matrix $P$, depending on the bases $U$ and $V$ such that if $X_U$ is the column of coordinates of a vector $x\in\mathbb{R}^n$ in the basis $U$, and $Y_V$ is the column of coordinates in the basis $V$, then $Y_V=PX_U$.

The matrix $P$, which depends also on the way we order the bases $U$ and $V$ is computed in the following way:

Step 1: Compute, for each $u_j$, its coordinates in the basis $V$. This is, solve, for each $j=1,2,\ldots,n$, the equation

$p_{1j}v_1+p_{2,j}v_2+\ldots+p_{n,j}v_n=u_j,$

where $p_{i,j}$ are the unknowns.

Step 2: Put the coefficients $p_{i,j}$ obtained as columns of a matrix (in the order in which they appear !!!! for $j=1,2,\ldots,n$). This is the matrix $P$, which we call the matrix of change of coordinates.

For the next section let us denote the matrix constructed above as $P_{VU}$ to emphasize that if you multiply from the right by coordinates in the basis $U$ it will return coordinates in the basis $V$, i.e. $X_V=P_{VU}X_U$, where $X_U$ and $X_V$ are the columns of coordinates of $x$ in the bases $U$ and $V$, respectively.

Observation: If we have the matrix $P_{VU}$ that changes coordinates from the basis $U$ to coordinates in the basis $V$ then the matrix that changes coordinates from the basis $V$ to coordinates in the basis $U$ is the inverse of $P_{VU}$, i.e. $P_{UV}=P_{VU}^{-1}$.

Computing the matrix of a linear transformation in one basis having the matrix in another basis

Suppose we are given a linear transformation $T:\mathbb{R}^n\rightarrow\mathbb{R}^n$. We have seen that there is a matrix $A$ such that for every vector $x\in\mathbb{R}^n$ we get $T(x)=Ax$.

Since we have seen that the column of numbers representing the vector $x$ are the same as the column of coordinates $X_E$ in the standard basis $E$, we can also write $T(x)_B=AX_B$, where $T(x)_E$ is the column of coordinates of $T(x)$ in the basis $E$ (keep in mind that $T(x)$ and $T(x)_E$ actually look the same since $E$ is the standard basis).

Assume that we have a basis $V$ and we compute the matrix of change of coordinates $P_{EV}$ from coordinates in the basis $V$ to coordinates in the basis $V$. Denote by $X_V$ the column of coordinates of $x$ in the basis $V$. Then $X_E=P_{EV}X_V$. We also have that $T(x)_E=P_{EV}T(x)_{V}$.

Therefore, substituting these in in the formula $T(x)_E=AX_E$ we get $P_{EV}T(x)_V=AP_{EV}X_V$, from where

$$T(x)_V=P_{EV}^{-1}AP_{EV}X_V.$$

So, if $A$ was the matrix that computes $T$ in the coordinates in the basis $E$, then $P_{EV}^{-1}AP_{EV}$ is the matrix computing $T$ in coordinates in the basis $V$. Once the matrices $P_{EV}$ and $P_{EV}^{-1}$ are computed as in the previous section, we get the new matrix of $T$ in the basis $V$ using the previous formula.

Observation: The formula above shows why the relation of similarity ($A$ similar to $P^{-1}AP$) is so important. Similar matrices can be used to represent the same linear transformation in different bases.

Example:

Suppose we have, in $\mathbb{R}^2$ the standard basis $E:=\{e_1=\begin{bmatrix}1\\0\end{bmatrix},e_2=\begin{bmatrix}0\\1\end{bmatrix}\}$ (in this order). Consider the basis $V:=\{v_1=\begin{bmatrix}1\\1\end{bmatrix},v_2=\begin{bmatrix}1\\-1\end{bmatrix}\}$ (in this order).

We can compute that $$\frac{1}{2}v_1+\frac{1}{2}v_2=e_1$$ and that $$\frac{1}{2}v_1-\frac{1}{2}v_2=e_2.$$

Putting these coefficients as columns (the coefficients of the first equation in the first column, the coefficients of the second equation in the second column. The order is very important) we get the matrix

$$P_{VE}=\begin{bmatrix}\frac{1}{2}&\frac{1}{2}\\\frac{1}{2}&-\frac{1}{2}\end{bmatrix}.$$

This is the matrix of change of coordinates from coordinates in the basis $E$ to coordinates in the basis $V$.

Assume now that $T:\mathbb{R}^2\rightarrow\mathbb{R}^2$ is a linear transformation given by $T\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}\right)=\begin{bmatrix}2&1\\1&2\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}$

As we have seen, this formula can be thought as written for coordinates in the standard basis $E$, i.e.

$$T\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}_E\right)=\begin{bmatrix}2&1\\1&2\end{bmatrix}\begin{bmatrix}x_1\\x_2\end{bmatrix}_E$$

because vectors in $\mathbb{R}^2$ and columns of their coordinates in the standard basis look the same.

Let us compute the formula for $T$ if we were using coordinates in the basis $V$.

We know that if $X_V$ is the column of coordinates of $x$ in the basis $V$ then $T(X_V)_V=P_{EV}^{-1}AP_{EV}X_V$, where $A=\begin{bmatrix}2&1\\1&2\end{bmatrix}$.

We have already computed

$P_{VE}$, so $P_{EV}=P_{VE}^{-1}$ is its inverse, we get

$$P_{EV}=\begin{bmatrix}1&1\\1&-1\end{bmatrix}.$$

Therefore we get

$$P_{EV}^{-1}AP_{EV}=P_{VE}AP_{EV}=\begin{bmatrix}3&0\\0&1\end{bmatrix}.$$

Notice how the matrix to compute $T$ using coordinates in the basis $V$ is much simpler than the matrix to compute $T$ using coordinates in the standard basis. This is the main reason to change basis, to change coordinates, to change variables, to change system of reference; we could get simpler formulas.

1
On

If we denote $\lambda_1,\ldots,\lambda_p$ the eigenvalues of the linear transformation $T$ then $T$ is diagonalizable if $$V=\bigoplus_{i=1}^p\ker(T-\lambda_iid_V)=\bigoplus_{i=1}^pE_{\lambda_i}(T)$$ Now if $B_i$ is a basis of eigenvectors of $E_{\lambda_i}(T)$ then $B=\cup_i B_i$ is basis of eignevectors of $V$ and the matrix of $T$ in this basis is diagonal.

0
On

The thing to remember is that matrices are concrete realizations of transformations, transformations being the abstract thing.

A transformation has a life of its own, and no matter how you represent it, the sum of dimensions of its eigenspaces is always going to be the same. If $A$ represents the transformation in a particular basis, then its representations in other bases are exactly $X^{-1}AX$ for every possible invertible matrix $X$.

Strictly speaking, at first it doesn't make much sense to say a transformation is diagonalizable, since that's a property of a matrix. But since $A$ is diagonalizable iff $X^{-1}AX$ is, we see that all of the matrices representing the transformation have to be diagonalizable at the same time, so it makes sense to transfer this to the transformation as well, even though it is abstract. Alternatively, you could just think of a transformation on $F^n$ being diagonalizable if the sum of dimensions of eigenspaces is $n$.

The proof that diagonalizability of a matrix is equivalent to having a basis of eigenvectors is pretty straightforward. Suppose first you have eigenvectors $v_1,\dots v_n$ (on the right) of a matrix $A$. Now put them as columns in a matrix $V$. Computing $AV$, you get $AV=VD$ where $D$ is a diagonal matrix. Since $V$ is nonsingular, you can invert it on the left.

Conversely, if $X^{-1}AX=D$, then $AX=XD$ makes it clear that the columns of $X$ are eigenvectors of $A$, and $X$ being invertible says they are linearly independent.

0
On

This is easier if one starts out talking about linear transformations, and only later about matrices. A linear operator $T:V\to V$ is diagonalisable if and only if $V$ admits a basis of eigenvectors for $T$. Expresssing $T$ on a basis $B$ results in a square matrix that is diagonal if and only if the vectors of $B$ are all eigenvectors (this is immadiate from the definitions). So $T$ is diagonalisable if and only if its matrix with respect to some basis is diagonal (and this happens precisely for those bases that consist entirely of eigenvectors for $T$).

Now if the matrix of $T$ one some random basis is $A$, then saying that $T$ is diagonalisable means that some bases change applied to $A$ (namely one to a basis of eigenvectors) must give a diagonal matrix. The formula for base change is that $A$ becomes $P^{-1}AP$ where $P$ is the matrix whose columns express the coordinates of the new basis vectors with resect to the old basis. Thus a matrix is diagonalisable if and only if $P^{-1}AP$ is diagonal for some invertible matrix$~P$.

Note that the adjective "diagonalisable" refers to what happen to matrices under base change, but the property it expresses is one that applies fundamentally to a linear transformations (namely having a basis of eigenvectors), and only in a secondary manner to their matrices.