matrix with fractional exponent, not getting expected output in Matlab/Octave

912 Views Asked by At

I have a matrix exponential function that is called a number of times in an integration routine from the heat conduction model I'm trying to implement. It works, and my results match the samples in the paper, but want to speed things up by breaking apart the exponential function so that I can precalculate the constant part and just pass that to the calling function. Unfortunately, the separation only works for integer values of the exponent, and I'm not sure why the manipulation doesn't hold for fractional values.

the matrix exponential is equation (20) in the paper defined as:

$e^{[F(s)]x}=\pmatrix{ \mathrm{cosh}{(\sqrt{s/\alpha_j}x)} & \frac{1}{\lambda_j\sqrt{s/\alpha_j}}\mathrm{sinh}{(\sqrt{s/\alpha_j}x)} \\ \lambda_j\sqrt{s/\alpha_j}\mathrm{sinh}{(\sqrt{s/\alpha_j}x)} & \mathrm{cosh}{(\sqrt{s/\alpha_j}x})}$

this is most often called in the model as $e^{[F(-\beta^2)](x_j-x)}$ where $\beta$ is a real eigenvalue, and $x_j$ is a nodal location.

Because $\beta$ and $x_j$ are known prior to the call for $e^{[F(-\beta^2)](x_j-x)}$, I wanted to do the manipulation below to separate out $x$:

$$ e^{[F(-\beta^2)](x-x_j)} = e^{[F(-\beta^2)]x_j}*e^{[F(-\beta^2)](-x)} = e^{[F(-\beta^2)]x_j}*[e^{[F(-\beta^2)](-1)}]^x $$

In that form, I can precalculate both exponential functions, and just do a scalar power operation with x inside the integration and other calling loops. Implemented it in Octave (and tried it in Matlab), led to a great speedup, but non integer values of x produce "wrong" answers. I realize a fractional matrix exponent involves solving an eigenvalue equation where $A^m=P(D^m)P^{-1}$ as per This Answer. It seems what I've done is consistent with rules for matrix exponentials. But here's an example to belabor the point ($\alpha_j=1, \lambda_j=2$):

$e^{[F(-\beta^2)]x}, \beta=1,x=1 = \pmatrix{\mathrm{cos}{(1)}&\mathrm{sin}{(1)}/2\\-2\cdot \mathrm{sin}{(1)}&\mathrm{cos}{(1)}}=\pmatrix{0.54030& 0.42074\\-1.68294& 0.54030}$

$e^{[F(-\beta^2)]x}, \beta=1,x=3 = \pmatrix{\mathrm{cos}\left( 3\right) & \frac{\mathrm{sin}\left( 3\right) }{2}\cr -2\cdot \mathrm{sin}\left( 3\right) & \mathrm{cos}\left( 3\right) }=\pmatrix{0.45360 & 0.44560\\-1.78241 & 0.45360}$

$e^{[F(-\beta^2)]x}, \beta=1,x=1.1 =\pmatrix{-0.989992 & 0.070560\\-0.282240 & -0.989992}$

These work fine in Octave, Matlab, and the integer ones checked out in Maxima whether x is input as a multiplication in the matrix function, or as an exponent to the matrix function.

Now, the problem seems to creep in when $\beta > \pi$. Integer values of x are ok: $e^{[F(-\beta^2)]x}, \beta=1.1\cdot\pi,x=3 = \pmatrix{-0.58779 & -0.11705\\ 5.59152 &-0.58779}$

$[e^{[F(-\beta^2)](1)}]^{x}, \beta=1.1\cdot\pi,x=3 = \pmatrix{-0.58779 & -0.11705\\ 5.59152 &-0.58779}$

But fractional values are not:

$e^{[F(-\beta^2)]x}, \beta=1.1\cdot\pi,x=1.1 = \pmatrix{ -0.790155 & -0.088679\\ 4.236109 & -0.790155}$

$[e^{[F(-\beta^2)](1)}]^{x}, \beta=1.1\cdot\pi,x=1.1 = \pmatrix{-0.9995066 & -0.0045447\\ 0.2170956 & -0.9995066}$

As I'm summing over a large number of Beta's of increasing size, this is a problem, and I can't seem to identify the cause. I just realized the issue occurs at $\beta > \pi$ when typing this up, so maybe there's something obvious there that I'm missing.

for reference, here's my implementation of the calling function:

function answer=expfxs(x,s,alphaj,lamj)
   answer = [cosh(sqrt(s/alphaj)*x),...
         sinh(sqrt(s/alphaj)*x)/(lamj*sqrt(s/alphaj));...
         sinh(sqrt(s/alphaj)*x)*lamj*sqrt(s/alphaj),...
         cosh(sqrt(s/alphaj)*x)];
endfunction
1

There are 1 best solutions below

1
On BEST ANSWER

Ok, after some fun roundabouts and a nice long learning experience, I have the following. It's long, but it answers things.

The algebra I was attempting to perform was giving erroneous results because $e^{Ax}=(e^A)^x$ is only strictly true with matrix $A$ when $x$ is an integer. Otherwise calculations of $(e^A)^x$ will involve solving a set of eigenvalue/eigenvector equations, which may have complex components. Arbitrary exponents of complex numbers can produce multi-valued results.

As most computational software makes use of a 'principle value' definition for multi-valued functions, it becomes entirely possible, often likely, that the numerical result will not select the value that makes the naive algebraic relationship hold true. That is exactly what was happening above. Working through the matrix solution, and then retaining a better definition of the complex eigenvalues, I was able to find a valid approach to separating the function into a pre-calculable part, although it's more complex than the initial algebraic attempt.

The numerical procedure according to the Octave source is to start with the solution process for the matrix exponential based on diagonalizing the matrix:

$A=PDP^{−1}$, and $A^{\gamma}=PD^{\gamma}P^{−1}$

Where $D$ is the diagonal matrix of eigenvalues of $A$, which for the function $e^{[F(s)]x}$ above is:

$D = \pmatrix{e^{-\sqrt{s/ \alpha_j}x} & 0\\0 & e^{+\sqrt{s/ \alpha_j}x}}$

and $P$ is the corresponding eigenvector matrix:

$P= \pmatrix{-1 & 1\\ \lambda_j\sqrt{s/\alpha_j} & \lambda_j\sqrt{s/\alpha_j}}$ or $P_{norm}= \sqrt{\frac{\alpha_j}{\lambda_j^2|s|+\alpha_j}} \cdot P$

Note that $P$ and $P_{norm}$ are independent of $x$.

In the original problem, the goal was to pre-calculate $A$ by setting $x=-1$, then raise to the power of $x$, such that:

$e^{[F(-\beta^2)](-x)}= [e^{[F(-\beta^2)](-1)}]^x$

So, substituting $x=-1$ and $s=-\beta^2$ into $D$ ($\beta$ is always positive & real), it becomes:

$D = \pmatrix{e^{+i\frac{\beta}{\sqrt{\alpha_j}}} & 0\\0 & e^{-i\frac{\beta}{\sqrt{\alpha_j}}}} = \pmatrix{\mu_1&0\\0&\mu_2}$

with accompanying matrices:

$P= \pmatrix{-1 & 1\\ -i\beta\lambda_j/ \sqrt{\alpha_j} & -i\beta\lambda_j/ \sqrt{\alpha_j} }$ and $P_{norm}= \sqrt{\frac{\alpha_j}{\alpha_j+(\beta\lambda_j)^2}}\cdot P$

Taking a diagonal matrix, $D$, to the $x$-power is accomplished by simply taking $\mu_i^x$. In this case, however, we know that $\mu_i$ are complex numbers. This is where the discrepancy in the original question came in.

From De Moivre's theorem, we know that for complex, unit-magnitude $z$,

$z^{1/n} \in e^{i(\frac{\theta+k*2\pi}{n})}$, $k = 0,1,...,n-1$

In general, numerical codes will select the $k=0$ output as the principle value, and this would have coincided with my desired output above. However, this was complicated by the fact that $\beta / \sqrt{\alpha_j}$ in this equation took on values larger than $\pi$. For values less than $\pi$ in both Matlab and Octave, $z^(1/n)$ selected the $k=0$ value, and there was no error relative to my expected output. Upon exceeding $\pi$, however, the output would choose a seemingly random value of $k$. (It seemed to be selecting the one producing the smallest $\mathrm{abs}(\mathrm{arg}(z^{1/n}))$, but this may have just been a coincidence, stemming from the complex logarithm it likely employs.)

Instead of De Moivre's theorem, many numerical codes implement the complex exponentiation using: $z^x=e^{x \cdot \mathrm{Log}z}$. This $Log$ function with a capital L, indicates that it takes the principle value of the logarithm for complex values such that: $z^x=re^{i\theta}=e^{x(\mathrm{ln}r+i\cdot \mathrm{arg}(z))}$

Complex numbers in Matlab and Octave are stored in rectangular form, so there is no information available within the stored number itself to indicate any value other than the principle values for the vector argument.

Thus, when $|\mathrm{arg}(z)|>\pi$, the argument function, typically $\mathrm{atan2}(\mathrm{imag}(z),\mathrm{real}(z))$ will output a projection of the argument into the range $[-\pi,\pi)$.

E.g., in the example at the end of my question, when $\alpha_j=1$ and $\beta=1.1\cdot \pi$, the eigenvalue argument is $\pm\beta/\sqrt{\alpha_j} = \pm3.4558 \mathrm{radians}$, but $\mathrm{arg}(e^{\pm\beta/\sqrt{\alpha_j}})=\mp2.8274 \mathrm{radians}$. This is the direct cause of the 'error' that inspired this question.

Finally, we can instead substitute our own function for the log to attempt to preserve the actual argument of the complex function using the model of a Riemann surface for $\mathrm{log}(z)$, where $\mathrm{log} z = \mathrm{ln}r + i(\mathrm{arg}z+2N\pi)$. Here $N$ is the number of revolutions made by the actual value of $\theta$, which can be visualized as a rising spiral orthogonal to the $z$-plane. $N$ can be calculated as idivide(theta,pi) or fix(theta/pi) in Matlab/Octave. This makes the exponential function for a unit vector $(r=1)$ as:

$z^x = e^{ix(\mathrm{arg}z+2N\pi)}$

and for our case where $\mu_i = e^{\pm i\beta/ \sqrt{\alpha_j}}$: $\mu_i^x = e^{ix(\mathrm{arg}(e^{\pm i\beta/ \sqrt{\alpha_j}})+2N\pi)}, N = \mathrm{fix}(\pm\frac{\beta/ \sqrt{\alpha_j}}{\pi})$

It becomes apparent that $(\mathrm{arg}z+2N\pi)$ is simply $\beta/\sqrt{\alpha_j}$, as the $arg$ function will remove any revolutions from the actual argument, and the $2N\pi$ term will simply add them back in. Assuming we are able to retain $\beta/\sqrt{\alpha_j}$, we've come full circle and the complex exponential for the diagonalized matrix $D$ comes back to:

$D^x = \pmatrix{e^{+i\frac{\beta}{\sqrt{\alpha_j}}\cdot x} & 0\\0 & e^{-i\frac{\beta}{\sqrt{\alpha_j}}\cdot x}}$

Regarding the initial intent of the problem: separating portions of the $e^{[F(s)](x_j-x)}$ function to permit majority pre-calculation and a simple algebraic operation when applying $x$, this is still possible. Obviously, the $x_j$ portion can be split out. A custom exponent function will need to be implemented, where $D^x$ is created explicitly. The $P$ matrices and $\beta/\sqrt{\alpha_j}$ values can be pre-calculated, and $PD^xP^{-1}$ will return the result. Since the formulation for eigenvalues is so simple, directly applying $\beta/\sqrt{\alpha_j}$ will avoid any principle value truncation.

After learning how the Octave code solves the $A^x$ problem, the above doesn't seem any more computationally intensive, although it won't be a hardcoded function. A few simple pathways for integer $x$ and $|\beta|<\pi$ could speed things up too. Now whether all this calculation undoes the benefit of precalculating a number of the subfactors, and if it saves any time over just calling $e^{[F(s)](x_j-x)}$, we'll have to just see what the profiler says.