The matrix logarithm is well-defined - but how can we *algebraically* see that it is inverse to the exponential, as a finite polynomial?

2k Views Asked by At

This question is inspired by this which I saw earlier today. I started writing my answer, to share the insight that the matrix logarithm can be defined on matrices that do not have unit norm using an alternative technique.

Now, Sangchul has posted a great answer explaining how it is that we know the map $X\mapsto\sum_{n=1}^\infty\frac{(-1)^{n-1}}{n}X^n$ defines a logarithm of $1+X$ whenever the sum is convergent, whenever $\|X\|\lt1$. By scaling, we can use this map to obtain arbitrary logarithms since there is an easy definition $\log(\lambda I)=(\log\lambda)I$ which satisfies $\exp(\log(\lambda I))=\lambda I$ trivially, and this matrix will also commute with all other matrices.

I prefer an approach that sidesteps the functional calculus entirely, that I first learnt on Wikipedia over a year ago. The argument proceeds like this, paraphrased by me:

For any invertible $n\times n$ complex matrix $X$, there is a basis of the space $\Bbb C^n$ in which $X=\bigoplus_{m=1}^kJ_m$ a decomposition into Jordan blocks with some associated eigenvalues $\lambda_m$. If we can find a matrix $Y=\bigoplus_{m=1}^nT_m$, where $\exp(T_m)=J_m$ for all $m$, then a simple inspection of the exponential series shows $\exp(Y)=\bigoplus_{m=1}^n\exp(T_m)=\bigoplus_{m=1}^nJ_m=X$, so $Y$ is a logarithm of $X$. Many will exist due to branching concerns.

It remains to find a logarithm of any arbitrary Jordan block. For a block $J$ with eigenvalue $\lambda$, we can write $J=\lambda(I+K)$ where $K$ is the matrix will all zero entries, except for entries $\lambda^{-1}$ on the first superdiagonal ($\lambda\neq0$ by invertibility). If we suppose the formal power series argument is valid, we can say: $$\begin{align}\log(\lambda(I+K))&=\log(\lambda)I+\log(I+K)\\&=\log(\lambda)I+K-\frac{1}{2}K^2+\frac{1}{3}K^3\\&+\cdots+(-1)^j\frac{1}{j-1}K^{j-1}\end{align}$$Since $K$ will be nilpotent of order $j$ if $j$ is the dimension of the Jordan block, the tail terms of the Mercator series vanish.

Any branch of the complex logarithm is appropriate for $(\log(\lambda))I$ - due to commutativity, re-exponentiation gives $\lambda\exp(K-(1/2)K^2+\cdots)\overset{?}{=}\lambda(1+K)$.

Then to claim that this process produces a logarithm for all invertible $X$, it suffices to demonstrate the following:

For all $\lambda\in\Bbb C$ and all $n\times n$ square matrices $K$ of the form: $$K=\begin{pmatrix}0&\lambda&0&0&\cdots\\0&0&\lambda&0&\cdots\\0&0&0&\lambda&\cdots\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&0\end{pmatrix}$$We have the identity (by commutativity, the two are equivalent): $$\exp\left(\sum_{m=1}^{n-1}\frac{(-1)^{m-1}}{m}K^m\right)=\prod_{m=1}^{n-1}\exp\left(\frac{(-1)^{m-1}}{m}K^m\right)=I_n+K$$

I am looking for an algebraic (or similar) proof of this result. Don't get me wrong - I appreciate indirect proofs, and find them rather magical whenever they arise, but I have never studied any rigorous development of the functional calculus. Analytic functions can be extended to matrix arguments through a variety of methods - matrix Taylor series, Cayley-Hamilton, bizarre Cauchy integral representations, power series convergent in Banach space... and I accept all of these as extensions, with perhaps some convenient properties such as derivative relations carrying over from the complex/real-analytic theory.

However, it seems suspicious that "higher-order" properties should also be preserved in this extension process. Although we can give well-defined and well-motivated analogues of $\exp$ and $\log$ to matrices, I don't see any immediate reason, a priori, to suppose that the extension reflects relations between them such as $\exp\log\equiv\mathrm{Id}$. The main reason for my suspicion is the following observation: analysis tends to work through limiting arguments, and finite sums just won't do - they yield polynomials only. It is then rather odd that an analytic series maintains its "special" properties despite collapsing into a finite polynomial series - $\Bbb C$ has no nilpotent nonzero elements, but the space of matrices certainly does, and is a key ingredient in the above construction of the logarithm.

So what am I looking for? I'm looking for a strong explanation for why what I'm calling "higher-order" properties should carry over in this extension process, especially since there are many different ways to extend analytic functions to matrix arguments: of course, any properties that can be deduced from the power series will carry over, e.g. $\exp(A+B)=\exp(A)\exp(B)$ if $A,B$ commute. However, $\exp\circ\log$'s power series is unclear to me here, since the $\log$ is not actually a power series in this context, really, but a finite polynomial. To reiterate, it is the use of nilpotent elements that concerns me - since this challenges the algebra of $\Bbb C$, I feel it should also challenge the analytic series which hail from $\Bbb C$: at least, it should need some more justification.

An algebraic proof (direct matrix-arithmetic proof, or maybe some clever linear algebra argument) for why this particular nilpotent logarithm should hold, would be greatly appreciated. My thoughts on this matter so far:

$$T:=\sum_{m=1}^{n-1}\frac{(-1)^{m-1}}{m}K^m\\=\begin{pmatrix}0&\lambda&-\frac{1}{2}\lambda^2&\cdots&(-1)^n\frac{1}{n-1}\lambda^{n-1}\\0&0&\lambda&\cdots&(-1)^{n-1}\frac{1}{n-2}\lambda^{n-2}\\0&0&0&\cdots&(-1)^n\frac{1}{n-3}\lambda^{n-3}\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&0\end{pmatrix}$$An observation of the way that this matrix's elements "shift" up a diagonal every time the matrix is squared, cubed, etc. shows that the main superdiagonal has nonzero entries once and only once in $I+T+\frac{1}{2}T^2+\cdots$, and we can easily partially compute the exponential: $$\exp(T)=\begin{pmatrix}1&\lambda&?&?&\cdots&?\\0&1&\lambda&?&\cdots&?\\0&0&1&\lambda&\cdots&?\\0&0&0&1&\cdots&?\\\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&0&\cdots&1\end{pmatrix}$$So it remains to show that the sum over all the remaining superdiagonals vanishes. Other pertinent point is that $T$ is nilpotent of order $n$, so the exponential series terminates at $\frac{1}{(n-1)!}T^{n-1}$. This re-explains my suspicion, since now we are claiming these two polynomials are inverse, which is false in the complex world that we began in.

How can we fill in the algebraic gaps here? The matrix powers seems quite intractable to symbolically compute.

N.B. Sanchul's answer in the linked post never once needs the invertibility of $X$, as far as I can see, whereas this Wikipedia-based construction does. How do we reconcile the two?

3

There are 3 best solutions below

1
On BEST ANSWER

Here is an answer based on the principle of permanence. Define $c_{n,m}$ by

$$ c_{n,m} = \sum_{\substack{j_1 \geq 1, \ldots, j_n \geq 1 \\ j_1 + \cdots + j_n = m}} \frac{(-1)^{j_1-1}}{j_1} \cdots \frac{(-1)^{j_n-1}}{j_n}. $$

Then, for $|z| < 1$, we have $\sum_{n=0}^{\infty} \frac{1}{n!} \left(\sum_{j=1}^{\infty} \frac{1}{j} |z|^j \right)^n < \infty$. This allows to rearrange the sum

$$ \sum_{n=0}^{\infty} \frac{1}{n!} \left(\sum_{j=1}^{\infty} \frac{(-1)^{j-1}}{j} z^j \right)^n = 1 + \sum_{n=1}^{\infty} \frac{1}{n!} \left( \sum_{j_1 = 1}^{\infty} \cdots \sum_{j_n = 1}^{\infty} \frac{(-1)^{j_1-1}}{j_1} \cdots \frac{(-1)^{j_n-1}}{j_n} z^{j_1 + \cdots + j_n} \right) $$

as we like. So,

\begin{align*} \exp(\log(1+z)) &= \sum_{n=0}^{\infty} \frac{1}{n!} \left(\sum_{j=1}^{\infty} \frac{(-1)^{j-1}}{j} z^j \right)^n \\ &= 1 + \sum_{n=1}^{\infty} \frac{1}{n!} \left(\sum_{m=1}^{\infty} c_{n,m} z^m \right) \\ &= 1 + \sum_{m=1}^{\infty} \left(\sum_{n=1}^{\infty} \frac{1}{n!} c_{n,m} \right) z^m. \end{align*}

Comparing this with $\exp(\log(1+z)) = 1+z$, we get

$$ \sum_{n=1}^{\infty} \frac{1}{n!} c_{n,m} = \begin{cases} 1, & m = 1; \\ 0, & m \geq 2. \end{cases} $$

Now, let $K$ be a nilpotent matrix, so that $K^{d+1} = 0$ for some $d \geq 1$. Then

\begin{align*} \exp\left(\sum_{j=1}^{\infty} \frac{(-1)^{j-1}}{j} K^j \right) &= \exp\left(\sum_{j=1}^{d} \frac{(-1)^{j-1}}{j} K^j \right) \\ &= \sum_{n=0}^{\infty} \frac{1}{n!} \left(\sum_{j=1}^{d} \frac{(-1)^{j-1}}{j} K^j \right)^n \\ &= 1 + \sum_{n=1}^{\infty} \frac{1}{n!} \left(\sum_{m=1}^{d} c_{n,m} K^m \right) \\ &= 1 + \sum_{m=1}^{d} \left(\sum_{n=0}^{\infty} \frac{1}{n!} c_{n,m}\right) K^m \\ &= 1 + K. \end{align*}

0
On

You wrote:

However, it seems suspicious that "higher-order" properties should also be preserved in this extension process.

This is not as suspicious as it seems. It is basically an extension of the principle of permanence of identities from polynomials to power series, which relies only on the uniqueness of power series expansions: if $\sum_{n=0}^{\infty} a_n z^n = \sum_{n=0}^{\infty} b_n z^n$ for all complex numbers $z$, or even all $z \in U$ where $U$ is any non-empty open set, then $a_n = b_n$ for all $n$.

The idea is that if you know that $\exp(\log z) = z$ for complex numbers $z \in U$, then you know, by the above property, that the power series expansion for $\exp(\log z)$ over $\mathbb{C}$ must all cancel out and leave only $z$ on the right side. But the calculation of the power series expansion is valid for matrices as well, since the $\exp$ and $\log$ functions for matrices are defined by the same power series, and there is only one variable involved so everything commutes for the matrices just like for $\mathbb{C}$. There are no convergence issues either, since for any fixed $n$, the coefficient of $z^n$ (or $A^n$ for matrices) is determined by a finite calculation.

This is explained in more detail with an example (for polynomials) in my linked answer on the principle of permanence of identities.

5
On

People should learn about formal power series before the are exposed to numerical power series; the formal ones are so much nicer.

I think you question (which is too long for me to grok entirely) comes down to asking for a direct proof that $\exp(\log(1+X))=1+X$ holds in the formal power series ring $R=\Bbb Q[[X]]$, where one can put $\log(1+X)=\sum_{n>0}(-1)^{n-1}\frac{X^n}n\in\Bbb Q[[X]]$ as the definition of the logarithm (more can be said, but for the current purpose this should suffice). As for the exponential, $\exp(S)$ is only defined for $X\in R$ (and with value in $R$) if there is no constant term, in other words if $S[X:=0]=0$. In that case one can define $\exp(S)$ as the unique solution $E$ of the differential equation $\def\d{\mathrm d}\def\D{\frac\d{\d X}}\D E=E\D S$ and with $E[X:=0]=1$ (this is analogous to defining, for some differentiable function $f$ with $f(0)=0$, the exponential $\exp(f(x))$ as $g(x)$ where $g'=gf'$ and $g(0)=1$).

To show this explicitly, it is a bit easier (though avoidable, which I leave as exercise) to substitute $X:=-X$ on both sides of the equation, so that we need to prove for $$ L=\log(1-X)=-\sum_{n>0}\frac{X^n}n\in\Bbb Q[[X]] $$ that $$ \D(1-X)=(1-X)\D L \qquad\text{and}\qquad (1-X)[X:=0]=1. $$ The second part is immediate, and $\D(1-X)=-1$ is also clear. It remains to compute $$\D L=-\sum_{n>0}\frac{nX^{n-1}}n=-\sum_{m\in\Bbb N}X^m,$$ which indeed when multiplied by $1-X$ in $R$ gives the constant series $-1$, and the proof is complete.


A more computational approach to exponentiation gives a slightly different perspective on this equality. Because of how the operator $\D$ acts on formal power series, it is convenient to write elements of $\Bbb Q[[X]]$ in the form $\sum_{n\in\Bbb N}a_n\frac{X^n}{n!}$ (the "exponential generating series" for the sequence $(a_n)_{n\in\Bbb N}$, so called because that of $(1)_{n\in\Bbb N}$ is $\exp(X)$); then $\D$ just drops the constant coefficient and shifts all other coefficients down one degree: $$ \D\left(\sum_{n\in\Bbb N}a_n\frac{X^n}{n!}\right)= \sum_{n\in\Bbb N}a_{n+1}\frac{X^n}{n!}. $$ On the other hand multiplication is somewhat more complicated; one has $$ \Bigl(\sum_{n\in\Bbb N}a_n\frac{X^n}{n!}\Bigr) \Bigl(\sum_{n\in\Bbb N}b_n\frac{X^n}{n!}\Bigr)= \sum_{n\in\Bbb N}c_n\frac{X^n}{n!} \quad\text{where}\quad c_n=\sum_{i=0}^n\binom nia_ib_{n-i}. $$ Now, with a slight variation of the above, I'd like to show that the exponential of $$ -L=-\log(1-X)=\sum_{n>0}(n-1)!\frac{X^n}{n!} \qquad\text{is}\qquad \frac1{1-X}=\sum_{n\in\Bbb N}n!\frac{X^n}{n!} $$ as the expression suggests. One easily computes $\D(-L)=\sum_{n\in\Bbb N}n!\frac{X^n}{n!}$ (this somewhat confusingly coincides with what we want to show is $\exp(-L)$; here it serves just to set of the differential equation). We then seek to find $E=\exp(-L)=\sum_{n\in\Bbb N}e_n\frac{X^n}{n!}$ as the series with $e_0=1$ and which satisfies $\D(E)=E\D(-L)=(\sum_{n\in\Bbb N}e_n\frac{X^n}{n!})(\sum_{n\in\Bbb N}n!\frac{X^n}{n!})$. The left hand side is $\D(E)=\sum_{n\in\Bbb N}e_{n+1}\frac{X^n}{n!}$, so we find the recurrence relation $$e_{n+1}=\sum_{i=0}^n\binom nie_i(n-i)!\qquad.$$ After using the initial condition $e_0=1$, one can solve this to give successive terms $e_1=1$, $e_2=2$, $e_3=6$, .... It is now easy to guess that $e_n=n!$, and indeed substituting this it satisfies the recurrence relation. That gives $E=\sum_{n\in\Bbb N}n!\frac{X^n}{n!}=\frac1{1-X}$, which is indeed what we set out to show that $\exp(-L)$ would give.