Intuition on spectral theorem

3k Views Asked by At

In the last month I studied the spectral theorems and I formally understood them. But I would like some intuition about them. If you didn’t know spectral theorems, how would you come up with the idea that symmetric/normal endomorphisms are the only orthogonally diagonalizable endomorphisms in the real/complex case. How would you even come up with the idea of studying the adjoint?

4

There are 4 best solutions below

3
On BEST ANSWER

Regarding the adjoint, suppose you have vectors spaces $X$ and $Y$ (over the same field), and a linear map $$ T:X\to Y $$ Write $X^*$ and $Y^*$ for the dual spaces. Then $T$ naturally induces a map $$ T^*:Y^* \to X^* $$ defined by $$ T^*(\phi):=\phi\circ T $$ This makes sense, because if $\phi$ is a linear functional on $Y$, then $\phi\circ T$ is a linear functional of $X$. Moreover, the function $T^*$ is also a linear transformation. This $T^*$ is called the adjoint of $T$ (there is a slight abuse of notation/terminology here, I'll elaborate on this in a moment). This is an example of what is called functorial behaviour. Taking adjoints is an example of what is called a contravariant functor.

Now, suppose that $X$ and $Y$ are finite-dimensional inner product spaces. Then you know that $X$ and $X^*$ can be canonically identified with each other. On the one hand, any $x\in X$ gives rise to a linear functional $\phi_x\in X^*$ defined by $$ \phi_x(v):=\langle v,x\rangle $$ Write $S_X:X\to X^*$ for the map that sends $x$ to $\phi_x$. It is easy to verify that $S_X$ is conjugate linear, i.e. $S_X(x+x')=S_X(x)+S_X(x')$ and $S_X(\alpha x)=\bar \alpha S_X(x)$.

On the other hand, given any $\phi\in X^*$, one can show that there exists (a unique) vector $x_\phi\in X$ such that, for every $v\in X$, $$ \phi(v)=\langle v, x_\phi\rangle $$ This shows that the function $S_X$ above is invertible, so it is "almost" an isomorphism, except for the fact that it is not strictly linear, but conjugate linear.

Now, the same thing can be done with $Y$, and we obtain a conjugate isomorphism $S_Y:Y\to Y^*$.

Consider now the composition $$ Y\overset{S_Y}{\longrightarrow} Y^*\overset{T^*}{\longrightarrow} X^* \overset{S^{-1}_X}{\longrightarrow} X $$ Call this composition $\hat T$, i.e. $\hat T(y)=(S^{-1}_X\circ T^*\circ S_Y)(y)$. You can check that $\hat T$ is linear.

Fix $x\in X$ and $y\in Y$. Put $\phi=(T^*\circ S_Y) y\in X^*$. Now, $S_X^ {-1}\phi$ is, by definition, the unique vector $z\in X$ such that $\langle v,z\rangle =\phi (v)$ for every $v\in X$. Therefore, $$ \langle x,\hat Ty\rangle =\langle x,S^{-1}_X\phi\rangle=\phi(x) $$ Now, $\phi=T^*(S_Yy)=(S_Yy)\circ T$. So, $$ \phi(x)=(S_Yy)(Tx) $$ Now, $S_Yy\in Y^*$ is the linear functional which right multiplies a vector in $Y$ by $y$. This means that $$ (S_Yy)(Tx)=\langle Tx,y\rangle $$ Putting everything together, we get that $$ \langle x,\hat Ty\rangle =\langle Tx,y\rangle $$ So, $\hat T$ has the property that "the adjoint" has in every linear algebra text. In practice, we use $T^*$ to refer to the above $\hat T$, and the original $T^*$ is left behind. I will be following this convention from now on, i.e. all $T^*$ in what follows really means $\hat T$. I should mention that having an inner product is key for all of this. For general vector spaces $X$ need not be isomorphic to $ X^*$.


Regarding your question about looking at normality, recall that, given a linear operators $T:X\to X$, a subspace $W\subset X$ is said to be $T$-invariant if $$ x\in W\implies Tx\in W $$ Define the orthogonal complement $$ W^\perp:=\{x\in X: \forall w\in W\langle x,y\rangle =0\} $$ Note that, if $W$ is $T$-invariant, then $W^\perp$ is $T^*$-invariant. Indeed, fix $x\in W^\perp$. We need to see that $T^*x\in W^\perp$. Let $w\in W$, then $$ \langle T^*x,w\rangle=\langle x,Tw\rangle=0 $$ because $x\in W^\perp$ and $Tw\in W$ (because $W$ is $T$-invariant). Since $w\in W$ was arbitrary, $T^*x\in W^\perp$.

If $T$ is, for example, self-adjoint, then we obviously have that a $W^\perp$ is $T$-invariant. This leads to the following question: can we find an easy property for an operator $T$ so that it satisfies that every $T$-invariant subspace has a $T$-invariant orthogonal complement? The answer to this question is yes, and the property is normality, see here.


How does this relate to being diagonalizable? Well, since the matrix of $T^*$ in the basis $B$ is the conjugate transpose of the matrix of $T$ in the basis $T$, it follows that any diagonalizable operator is necessarily normal.

Suppose now that $T$ is normal. Pick an eigenvalue $\lambda$ of $T$. Let $E$ be the associated eigenspace. Clearly, $E$ is $T$-invariant. Write $$ X=E\oplus E^\perp $$ By normality, $E^\perp$ is also $T$-invarint. This means that we can consider the restricted operator $T|_{E^\perp}:E^\perp \to E^\perp$. This new operator is also normal. But $\dim (E^\perp)<\dim X$, and we can carry out an inductive argument.

0
On

To give a bit of a shorter answer: In the hermitian case observe that if both $x$ and $y$ are eigenvectors of $A$, corresponding to the eigenvalues $\lambda$ and $\mu$, then:

$$\begin{aligned} &\langle Ax, y \rangle = \langle \lambda x, y \rangle = \lambda \langle x, y \rangle \\ &\quad= \\ &\langle x, A^*y \rangle = \langle x, A y \rangle =\langle x, \mu y \rangle = \overline\mu \langle x, y \rangle \end{aligned}$$

Hence, $(\lambda -\overline\mu) \langle x, y \rangle =0$ implying either $\lambda=\overline\mu$ or $x\perp y$. If we picked the same eigenvector twice $(x=y)$, if follows $\lambda=\overline\lambda$, so all eigenvalues must be real. Consequently, the eigenspaces corresponding to different eigenvalues are orthogonal to each other.

From this observation alone, lots of consequences follow quite naturally. One can easily prove that in this case a full orthogonal basis exists (see e.g. this write-up or try for yourself); likewise, if an orthonormal eigenbasis corresponding to real eigenvalues exists, one can easily prove that $A$ must be hermitian.

The normal case is a bit more tricky, but one can play a similar game (may expand later).

6
On

Almost everything about this subject was derived in the opposite order of what you have been taught. That's why it is difficult to answer your question.

  • The infinite-dimensional case was studied for functions before the finite-dimensional case, and well before the notion of a vector space.

  • Orthogonality was noticed and defined using integral conditions about 150 years before an inner product was defined, and before finite-dimensional Linear Algebra. These observations led to the notion of a general inner product space.

  • Linearity came out of the physical condition of superposition of solutions for the Heat Equation and vibrating string problem, not the other way around.

  • Self-adjoint was defined before there was an inner-product, through Lagrange's adjoint equation, which gave, among other things, a reduction of order tool for ODEs, and a notion of "integral orthogonality."

It's all upside down from the point-of-view of abstraction. Asking how you might start at the lowest level of abstraction and naturally move toward the more abstract direction is asking how to motivate the backwards direction from the Historical forward direction that brought us to this point. It wasn't derive that way, and might never have been.

0
On

Let ${ A \in \mathbb{C} ^{n \times n} }.$ It is easier to visualise ${ A }$ if it admitted an orthonormal basis of eigenvectors. We can ask ourselves: When does ${ A }$ admit an orthonormal basis of eigenvectors ?

Say ${ A }$ admits an orthonormal basis of eigenvectors ${ P = [P _1, \ldots, P _n] }.$ Now $${ A [P _1, \ldots , P _n] = [\lambda _1 P _1, \ldots, \lambda _n P _n ] ,}$$ that is ${ AP = P D }$ for a diagonal matrix ${ D }.$ Equivalently ${ A = P D P ^{*} }$ with ${ D }$ diagonal and ${ P }$ unitary.
Now it’s adjoint is ${ A ^{*} = P D ^{*} P ^* }.$ One basic relation between ${ A, A ^{*} }$ is that they commute, i.e. ${ A A ^{*} = A ^{*} A }$ (this need not hold for general matrices).

This suggests: Say ${ A \in \mathbb{C} ^{n \times n} }$ with ${ A A ^{*} = A ^{*} A }.$ Does ${ A }$ admit an orthonormal basis of eigenvectors ?

It turns out yes.


Thm: Let ${ A \in \mathbb{C} ^{n \times n} }$ with ${ A A ^{*} = A ^{*} A }.$ Then ${ A }$ admits an orthonormal basis of eigenvectors.

Pf: Pick an eigenvector ${ v _1 }$ of ${ A }$ (say it’s eigenvalue is ${ \lambda _1 }$). Construct an orthonormal basis ${ \mathscr{B} = \left( \frac{v _1}{\lVert v _1 \rVert}; u _1, \ldots, u _{n-1} \right) }.$ Looking at operator ${ A }$ w.r.t orthonormal basis ${ \mathscr{B} ,}$ we see $${ A \mathscr{B} = \mathscr{B} \begin{pmatrix} \lambda _1 &\alpha _1 &\ldots &\alpha _{n-1} \\ 0 & &M & \end{pmatrix} }$$ for some matrix ${ M }.$

Since ${ A A ^{*} = A ^{*} A }$ we have $${ \mathscr{B} \begin{pmatrix} \lambda _1 &\alpha _1 &\ldots &\alpha _{n-1} \\ 0 & &M & \end{pmatrix} \mathscr{B} ^{*} \mathscr{B} \begin{pmatrix} \overline{\lambda _1} &0 \\ \overline{\alpha _1} & \\ \vdots &M ^{*} \\ \overline{\alpha _{n-1}} & \end{pmatrix} \mathscr{B} ^{*} }$$ $${ = \mathscr{B} \begin{pmatrix} \overline{\lambda _1} &0 \\ \overline{\alpha _1} & \\ \vdots &M ^{*} \\ \overline{\alpha _{n-1}} & \end{pmatrix} \mathscr{B} ^{*} \mathscr{B} \begin{pmatrix} \lambda _1 &\alpha _1 &\ldots &\alpha _{n-1} \\ 0 & &M & \end{pmatrix} \mathscr{B} ^{*} }$$ that is $${ \begin{pmatrix} \lambda _1 &\alpha _1 &\ldots &\alpha _{n-1} \\ 0 & &M & \end{pmatrix} \begin{pmatrix} \overline{\lambda _1} &0 \\ \overline{\alpha _1} & \\ \vdots &M ^{*} \\ \overline{\alpha _{n-1}} & \end{pmatrix} }$$ $${ = \begin{pmatrix} \overline{\lambda _1} &0 \\ \overline{\alpha _1} & \\ \vdots &M ^{*} \\ \overline{\alpha _{n-1}} & \end{pmatrix} \begin{pmatrix} \lambda _1 &\alpha _1 &\ldots &\alpha _{n-1} \\ 0 & &M & \end{pmatrix} }.$$ Focusing on the top left entry, ${ \vert \lambda _1 \vert ^2 + \vert \alpha _1 \vert ^2 + \ldots + \vert \alpha _{n-1} \vert ^2 = \vert \lambda _1 \vert ^2 ,}$ hence ${ \alpha _1 = \ldots = \alpha _{n-1} = 0 }.$ Now focusing on the bottom right block, ${ M M ^{*} = M ^{*} M }.$

So $${ A = \mathscr{B} \begin{pmatrix} \lambda _1 & \\ &M \end{pmatrix} \mathscr{B} ^{*} }$$ with ${ M M ^{*} = M ^{*} M }.$

By induction hypothesis, since ${ M M ^{*} = M ^{*} M }$ we see ${ M }$ admits an orthonormal basis of eigenvectors. So there is an orthonormal basis ${ \mathcal{Q} = [Q _1, \ldots, Q _{n-1}] }$ of ${ \mathbb{C} ^{n-1} }$ such that $${ M \mathcal{Q} = \mathcal{Q} \begin{pmatrix} \lambda _2 & & \\ &\ddots & \\ & &\lambda _{n} \end{pmatrix} = \mathcal{Q} D }.$$

Now $${ A = \mathscr{B} \begin{pmatrix} \lambda _1 & \\ &M \end{pmatrix} \mathscr{B} ^{*} }$$ with ${ M = \mathcal{Q} D \mathcal{Q} ^{*}.}$

So $${ \begin{align*} A &= \mathscr{B} \begin{pmatrix} \lambda _1 & \\ &\mathcal{Q} D \mathcal{Q} ^{*} \end{pmatrix} \mathscr{B} ^{*} \\ &= \mathscr{B} \begin{pmatrix} 1 & \\ &\mathcal{Q} \end{pmatrix} \begin{pmatrix} \lambda _1 & \\ &D \end{pmatrix} \begin{pmatrix} 1 & \\ &\mathcal{Q} \end{pmatrix} ^{*} \mathscr{B} ^{*} \end{align*} }$$ that is $${ A \mathscr{B} \begin{pmatrix} 1 & \\ &\mathcal{Q} \end{pmatrix} = \mathscr{B} \begin{pmatrix} 1 & \\ &\mathcal{Q} \end{pmatrix} \begin{pmatrix} \lambda _1 & \\ &D \end{pmatrix}. }$$ The new basis ${ \mathscr{B} \begin{pmatrix} 1 & \\ &\mathcal{Q} \end{pmatrix} }$ is orthonormal since ${ \mathcal{Q} }$ is unitary.

Hence ${ \mathscr{B} \begin{pmatrix} 1 & \\ &\mathcal{Q} \end{pmatrix} }$ is an orthonormal basis of eigenvectors of ${ A },$ as needed.