Question on The Theorem of Existence and Uniqueness of The Solutions to Sylvester Equations

46 Views Asked by At

Background

I am self-studying linear algebra, and I got stuck on some steps of the proof of the following theorem:

Theorem$\quad$ Let $A$ be an $n \times n$ complex matrix and $B$ be an $m \times m$ complex matrix. For any $n \times m$ complex matrix $C$, the Sylvester equation $AX+XB=C$ has a unique solution $X$, which is an $n \times m$ complex matrix, if and only if $A$ and $-B$ do not share any eigenvalue.

Here is the proof:

Proof$\quad$ The equation $AX+XB=C$ is a linear system with $mn$ unknowns and $mn$ equations. Hence, it is uniquely solvable for any given $C$ if and only if the homogeneous equation $AX+XB=0$ admits only the trivial solution.

First suppose that $A$ and $-B$ do not share any eigenvalue. Let $X$ be a solution to the homogeneous system $AX+XB=0$. Then $AX=X(-B)$. Since \begin{align*} &A(AX) = A(X(-B))\\ \implies\ &(AA)X = (AX)(-B) = (X(-B))(-B) = X((-B)(-B))\\ \implies\ &A^2X = X(-B)^2, \end{align*} by mathematical induction, we have $A^kX=X(-B)^k$ for each $k\in\mathbb{N}$. Then, $p(A)X = Xp(-B)$ for any polynomial $p$. In particular, let $p$ be the characteristic polynomial of $A$; that is, let $p(r) = det(A-rI)$. Then, $p(A) = 0$. If $E$ is any square matrix, let $\sigma(E)$ denote the set of eigenvalues of $E$. Then, $\sigma(p(-B)) = p(\sigma(-B))$. Since $A$ and $-B$ do not share any eigenvalue, $p(\sigma(-B))$ does not contain zero, and hence, $p(-B)$ is nonsigular. Thus, $X$ is the zero matrix.

Conversely, suppose that $A$ and $-B$ share an eigenvalue $\lambda$. Let $\mathbf{u}$ be a corresponding right eigenvector for $A$ and $\mathbf{v}$ be a corresponding left eigenvector for $-B$; that is $A\mathbf{u}=\lambda\mathbf{u}$ and $\mathbf{v}(-B) = \lambda\mathbf{v}$. Let $X=\mathbf{u}\mathbf{v}^*$. Then, $X$ is not the zero matrix, and $AX+XB = A(\mathbf{u}\mathbf{v}^*) - (\mathbf{u}\mathbf{v}^*)(-B) = \lambda\mathbf{u}\mathbf{v}^* - \lambda\mathbf{u}\mathbf{v}^* = 0$. Therefore, $X$ is a nontrovial solution to the homogeneous system $AX+XB=0$.

My Questions

I could not understand certain steps in the above proof.

For the "if" direction (the second paragraph in the proof):

  1. After it proved that $A^kX=X(-B)^k$, it claimed that $p(A)X=Xp(-B)$ for any polynomial. Why is that?
  2. After it defined $p$ to be the characteristic polynomial, it said that $p(A)=0$. Is this because one might just plug in $A$ to $p(r)$ to get $p(A) = det(A-AI) = det(0) = 0$?
  3. I have difficulties understanding the notation of $p(\sigma(-B))$. While the $\sigma(-B)$ is the set of eigenvalues of $-B$, what is $p(\sigma(-B))$ then? Is it the set of the image of the eigenvalues of $-B$ under the polynomial $p$? If so, does $\sigma(p(-B)) = p(\sigma(-B))$ mean that the two sets are equal?
  4. Why does the fact that $A$ and $-B$ do not share any eigenvalue imply that $p(\sigma(-B))$ does not contain zero, and how does it implies that $p(-B)$ is nonsingular?
  5. Why does the nonsigularity of $p(-B)$ imply that $X$ is the zero matrix?

For the "only if" part (the third paragraph in the above proof):

  1. I am not familiar with the notation $\mathbf{v}^*$. Is it the conjugate transpose of $\mathbf{v}$? If so, why cannot $\mathbf{u}\mathbf{v}^*$ be the zero matrix?

I apologize for my lousy beginner linear algebra background. I would really appreciate it if someone could help me with these questions!

2

There are 2 best solutions below

0
On BEST ANSWER
  1. Short version: $X$ distributes over the additions in $p(A)$ and then in each term, you swap the $A^k X$ for an $X(-B)^k$, then undistributed the $X$.
    In detail: Let $N \in \Bbb{N}$ and $p(A) = \sum_{i=0}^N c_i A^i$. Then \begin{align*} p(A)X &= \sum_{i=0}^N c_i A^i X \\ &= \sum_{i=0}^N c_i X (-B)^i \\ &= X \sum_{i=0}^N c_i (-B)^i \\ &= X p(-B) \text{.} \end{align*}

  2. Note that $p(A)X$ is square. Then this is the Cayley-Hamilton theorem.

  3. $p(\sigma(-B))$ is the set of values $p(e_1)$, $p(e_2)$, ... $p(e_N)$ where $-B$ has $N$ eigenvalues (with or disregarding repetition; the set is the same either way) named $e_1$, $e_2$, ..., $e_N$. This uses fairly common notation: for $f$ a function and $S$ a set, $f(S) = \{f(s) \mid s \in S\}$. And yes, $\sigma(p(-B)) = p(\sigma(-B))$ means the two sets are equal: "the set of eigenvalues of the matrix $p(-B)$ is the set of values obtained by evaluating $p$ at the eigenvalues of $-B$". (The original complaint here was based on misremembering which matrices were non-square.)

  4. The roots of the characteristic polynomial are its eigenvalues. So, having taken $p$ to be the characteristic polynomial of $A$, $p$ is zero on each of $A$'s eigenvalues. (Additional theory you should know: (1) the characteristic polynomial need not be the minimal (degree) polynomial, (2) if it is not the minimal polynomial the only "additional" roots are repetitions of the minimal polynomial's roots. Consequently, $p$ is zero exactly on the eigenvalues of $A$ and is nonzero elsewhere.) Since $A$ and $B$ do not share eigenvalues, $p$ cannot be zero on any of $B$'s eigenvalues. That is $0 \not\in p(\sigma(-B))$, so $0 \not\in \sigma(p(-B))$ (by the argument two-ish sentences prior in the Proof), so $p(-B)$ is nonsingular. (To be singular, one or more eigenvalues is zero.)

  5. When we took $p$ to be the characteristic polynomial of $A$, this simplified $p(A)X = X(-B)$ to $0 = X(-B) = -XB$, equivalently $XB = 0$. Since $B$ has only nonzero eigenvalues, its nullspace is trivial, so all the rows of $X$ must be the zero vector. (In other words, since $B$ is nonsingular, for it to send a row of $X$ to the zero vector, that row of $X$ must be the zero vector. Since $B$ sends all the rows of $X$ to the zero vector, they must all be the zero vector so $X$ is the zero matrix.)

  6. (I probably would have numbered this "6" since you were numbering question blobs.) Just reading through how it's used, $\mathbf{v}^*$ only needs to be the transpose of $\mathbf{v}$. (And it must at least be the transpose for $X$ to have the correct dimensions.)
    The zero vector is never an eigenvector, so neither $\mathbf{u}$ nor $\mathbf{v}$ is the zero vector, so their (outer) product is not the $0$ matrix.

0
On
  1. Let $p(x)=\sum_{k=0}^mc_kx^k$. Then $$ p(A)X=\sum_{k=0}^mc_kA^kX=\sum_{k=0}^mc_kX(-B)^k=X\sum_{k=0}^mc_k(-B)^k=Xp(-B). $$
  2. No. This is the assertion of Cayley-Hamilton theorem. Your reasoning is the typical incorrect one. Note that $p(A)$ is an $n\times n$ matrix but $\det(A-AI)$ is a complex number. They are two different kinds of things.
  3. Yes. Your interpretations are correct.
  4. Factorise $p(x)$ as $\prod_{k=1}^n(x-\lambda_k)$. Then $\sigma(A)=\{\lambda_1,\ldots,\lambda_n\}$ and $p(-B)=\prod_{k=1}^n(-B-\lambda_kI)$. For any number $\mu$, note that $p(\mu)=0$ if and only if $\mu=\lambda_k$ for some $k$. Now the following statements are equivalent:
    • $p(-B)$ is singular;
    • $-B-\lambda_kI$ is singular for some $k$;
    • $\lambda_k\in\sigma(-B)$ for some $k$, i.e., $A$ and $-B$ shares an eigenvalue $\lambda_k$;
    • $p(\mu)=0$ for some $\mu\in\sigma(-B)$;
    • $0\in p(\sigma(-B))$.
  5. Here $p$ is the characteristic polynomial of $A$. We have $p(A)=0$ (Cayley-Hamilton theorem) and $p(A)X=Xp(-B)$. Hence $Xp(-B)=0$. If $p(-B)$ is nonsingular, then $X=Xp(-B)p(-B)^{-1}=0p(-B)^{-1}=0$.

Finally, yes, $v^\ast$ denotes the conjugate transpose of $v$. Since $u$ and $v$ are nonzero (because they are eigenvectors), $uv^\ast$ cannot possibly be the zero matrix. (Otherwise, for instance, we would arrive at the contradiction that $0=u^\ast(0)v=u^\ast(uv^\ast)v=(u^\ast u)(v^\ast v)=\|u\|_2^2\|v\|_2^2>0$. Alternatively, since $u$ and $v$ are nonzero vectors, $u_i$ and $v_j$ are nonzero for some $i$ and $j$. Therefore $(uv^\ast)_{ij}=u_i\overline{v}_j\ne0$. Hence $uv^\ast\ne0$.)