Proof of Cayley-Hamilton Theorem for Diagonalisable Matrices [Lay P326 Ch 5 Sup Q7]

4.4k Views Asked by At

Proof for Diagonal Matrices from Page 2 of 7: Let $A \in M_{n}(C)$ be diagonal, to wit, $A _{ii}=\lambda_{i}$.
Then $ p_{A}(t) = \det(tI-A)= \det \begin{bmatrix} t - \lambda_1 & ~ & ~ \\ ~ & \ddots & ~ \\ ~ & ~ & t - \lambda_n \\ \end{bmatrix} =\prod_{i=1}^{n}(t-\lambda_{i}) \quad (♦)$
and $p_A(A)= \prod_{i=1}^{n}(A- \color{forestgreen}{ \lambda_iI } ) $ , a product of diagonal matrices.

$1.$ How does $p_A(A)= \prod_{i=1}^{n}(A-\lambda_{i}I) $ ? $(♦)$ contains $\lambda_i$ and NOT $\color{forestgreen}{ \lambda_iI }$ ?
What legitimates this? $t$ is a variable but A is a matrix, so they can't be equal?

Does the proof repeat this technique for the last line of this proof, denoted with $\color{ orangered }{ ( \yen ) }$?

As in the previous examples (on the linked PDF in the first sentence), since $A$ is diagonal, $$ p_{A}(A) \mathop{=}^{\color{ red }{\clubsuit} } \begin{bmatrix} p_A(\lambda_1) & ~ & ~ \\ ~ & \ddots & ~ \\ ~ & ~ & p_A(\lambda_n) \\ \end{bmatrix} = \begin{bmatrix} \prod_{i=1}^{n}(\lambda_1 -\lambda_{i}) & ~ & ~ \\ ~ & \ddots & ~ \\ ~ & ~ & \prod_{i=1}^{n}(\lambda_n -\lambda_{i}) \\ \end{bmatrix} = \text{ 0 matrix },$$ where $\prod_{i=1}^{n}(\lambda_n -\lambda_{i}) = ...(\lambda_n -\lambda_{n})= 0 $, and the same holds for all the other diagonal entries.

$2.$ How does $p_{A}(A)$ equal that diagonal matrix, as denoted with $\color{ red }{ ( \clubsuit )} $ ?

Proof for Diagonalisable Matrices: Similar matrices have the same eigenvalues (and thus characteristic polynomials), so suppose for similar matrices A and $B$ (now $A$ may NOT be diagonal): $ p_{A}(z)=p_{B}(z)=\displaystyle \sum_{i=0}^{n}c_{i}z^{i} \implies p_{A}(A)=\sum_{i=0}^{n}c_{i}A^{i} \quad \color{ orangered }{ ( \yen ) } $ (I omit the rest of the proof.)

2

There are 2 best solutions below

9
On BEST ANSWER

I believe what is happening is that the two "functions" $p_A(t)$ and $p_A(A)$ are defined by a product. $\prod_{i=1}^n(t-\lambda_i)$,$\prod_{i=1}^n(A-\lambda_iI)$,

In $1$ Dimension, $p_A(t)=\det(tI-A)$, i.e. in this case the product and the determinant are the same.

Clearly this is not the same in higher dimensions.

Now $p_A(A)$ is itself a diagonal matrix (since it is the product of diagonal matrices).

This is because $p_A(A)=\prod_{i=1}^n(A-\lambda_iI)$, for each $i$, $(A-\lambda_iI)$ is a diagonal matrix (because both $A$ and $I$ are).

Thus $p_A(A)=\prod_{i=1}^n(A-\lambda_iI)=(A-\lambda_1I)...(A-\lambda_nI)$, is a product of diagonal matrices.

We see that the $j$-th diagonal entry is: $\prod_{i=1}^n(A_{jj}-\lambda_i)=\prod_{i=1}^n(\lambda_j-\lambda_i)=p_A(\lambda_j)$.

This because:

$p_A(A)=\prod_{i=1}^n(A-\lambda_iI)=(A-\lambda_1I)...(A-\lambda_nI)$

$=\left[\begin{array}{ccc} a_{11}-\lambda_1 & 0 & ... & 0\\ 0 & a_{22}-\lambda_1& ...& 0 \\ ...& ...& ...& ...\\ 0 & ... & ...& a_{nn}-\lambda_1 \end{array} \right]...\left[\begin{array}{ccc} a_{11}-\lambda_n & 0 & ... & 0\\ 0 & a_{22}-\lambda_n& ...& 0 \\ ...& ...& ...& ...\\ 0 & ... & ...& a_{nn}-\lambda_n \end{array} \right]$

$=\left[\begin{array}{ccc} (a_{11}-\lambda_1)...(a_{11}-\lambda_n) & 0 & ... & 0\\ 0 & (a_{22}-\lambda_1)...(a_{22}-\lambda_n)& ...& 0 \\ ...& ...& ...& ...\\ 0 & ... & ...& (a_{nn}-\lambda_1)...(a_{nn}-\lambda_n) \end{array} \right]$

$=\left[\begin{array}{ccc} \prod_{j=1}^n(a_{11}-\lambda_j) & 0 & ... & 0\\ 0 & \prod_{j=1}^n(a_{22}-\lambda_j)& ...& 0 \\ ...& ...& ...& ...\\ 0 & ... & ...& \prod_{j=1}^n(a_{nn}-\lambda_j) \end{array} \right]$

$=\left[\begin{array}{ccc} \prod_{j=1}^n(\lambda_1-\lambda_j) & 0 & ... & 0\\ 0 & \prod_{j=1}^n(\lambda_2-\lambda_j)& ...& 0 \\ ...& ...& ...& ...\\ 0 & ... & ...& \prod_{j=1}^n(\lambda_n-\lambda_j) \end{array} \right]$

$=\left[\begin{array}{ccc} p_A(\lambda_1) & 0 & ... & 0\\ 0 & p_A(\lambda_2)& ...& 0 \\ ...& ...& ...& ...\\ 0 & ... & ...& p_A(\lambda_n) \end{array} \right]$

From this we see that each diagonal entry in the matrix correspond to what you have posted.

Please ask if anything is unclear.

edit

They are related in the sense that substituting $A$ for $t$ in $p_A(t)$ will give you $p_A(A)$, but the latter is an $n×n$ matrix, whereas, $p_A(t)$ is a scalar, so they only truly coincide if A is a one dimensional matrix.

The reason that substituting $A$ for $t$ in $p_A(t)$ gives us $p_A(A)$, is as follows,

Let $t=A$ (and in the formula change $\lambda_i$ to $\lambda_i I$ so that we can add, subtract and multiply with the correct dimensions), and we get:

edit

You are correct here, because the dimensions don't agree, we need to let $t$ be the matrix $A$ and let $\lambda_i$ be $\lambda_i I$ so really we should have:

$t\to A$, $\lambda_i\to \lambda_i I$

$p_A(t)=\prod_{i=1}^n(t-\lambda_i)\to\prod_{i=1}^n(A-\lambda_i I)=p_A(A)$.

1
On

In answer to 1) you can substitute $t$ for $A$ because we can evaluate $p(x)$ in an algebra over $\mathbb{C}$ (in this case). And yes it does repeat in the last bit of the proof.