Interpreting "geometric mean" $\exp(\frac{1}{2} \log(P) + \frac{1}{2} \log(Q))$ for singular symmetric matrices $P, Q$

215 Views Asked by At

Let $\mathcal S_+^d$ be set of real $d \times d$ symmetric positive semidefinite matrices and $\mathcal S_{++}^d$ be set of real $d \times d$ symmetric positive definite matrices

Following section 1.3 of the paper Quantum Optimal Transport for Tensor Field Processing (arXiv link, published not-open access version here) for $P \in \mathcal S_+^d$ and $(Q_i)_{i \in I} \subset S_+^d$ define $$ \exp\left(P + \sum_{i \in I} \log(Q_i) \right) $$ to be the matrix in $S_+^d$ with kernel $\sum_{i \in I} \ker(Q_i)$ and which is unambiguously defined on the orthogonal of that space.


Just using this definition, we get the following: for $x \in \ker(P)^{\perp} = \text{ran}(P^{\mathsf t}) = \text{ran}(P)$ there exists a $y \in \mathbb R^d$ with $x = P y$. As $P$ is symmetric, we can orthogonally diagonalize $P = U^{\mathsf t} D U$, where $D := \text{diag}(\lambda_1, \ldots, \lambda_d)$. Hence \begin{align*} \exp(\log(P)) x & = \exp\big(0_{d \times d} + \log(P)\big) P y = \sum_{k = 0}^{\infty} \frac{1}{k!} \log(P)^k \cdot P y \\ & = \sum_{k = 0}^{\infty} \frac{1}{k!} U^{\mathsf t} \log(D)^k U \cdot U^{\mathsf t} D U y \\ & = U^{\mathsf t} \left(\sum_{k = 0}^{\infty} \frac{1}{k!} \log(D)^k D \right) U y = U^{\mathsf t} \text{diag}\left(\left(\sum_{k = 0}^{\infty} \frac{1}{k!} \log(\lambda_j)^k \lambda_j \right)_{j = 1}^{d} \right) U y \\ & = U^{\mathsf t} \text{diag}\left(\left(\lambda_j^2 \right)_{j = 1}^{d} \right) U y = U^{\mathsf t} D^2 U y = P^2 y = P x. \end{align*} Hence $\exp(\log(P)) = P$.

This implies moreover that $\exp\left(\frac{1}{2} \log(P)\right) = \sqrt{P}$ for $P \in \mathcal S_+^d$, since $$\exp\left(\frac{1}{2} \log(P)\right) \cdot \exp\left(\frac{1}{2} \log(P)\right) = \exp(\log(P)) = P.$$


My question. In Example 2, for $P, Q \in S_+^d$ the "geometric mean" $$ M(P, Q) := \exp\left(\frac{1}{2} \log(P) + \frac{1}{2} \log(Q)\right) $$ is mentioned. (Side question: Is $M(P, Q) = P^{\frac{1}{2}} (P^{-\frac{1}{2}} Q P^{-\frac{1}{2}})^{\frac{1}{2}} P^{\frac{1}{2}}$, that is, is $M$ the "real" geometric mean?).
Now, as the matrix logarithm is not defined for singular matrices $P \in \mathcal S_{+}^d \setminus \mathcal S_{++}^d$, I don't know how to interpret this notation $M$.

Remark. We have $M(P, Q) = f(\sqrt{P}, \sqrt{Q})$ for the function $f(A, B) := \exp\big(\log(A) + \log(B)\big)$, which is discussed here. In particular $M(P, Q) = \sqrt{P Q}$ if $P Q = Q P$. By the Golden-Thompson inequality we have $\text{tr}\big(M(P, Q)\big) \le \text{tr}\big(\sqrt{P} \sqrt{Q}\big)$ for $P, Q \in \mathcal S_{++}^d$ and equality if and only if $P$ and $Q$ commute.

Another perspective: in the scalar case ($d = 1$), $M$ is, up to an additive constant of $-\ln(2)$, equal to the LogSumExp function.

Ideas. If $P, Q \in \mathcal S_{++}^d$ were positive definite, then (by the unicity of a PSD square root) we would have $\frac{1}{2} \log(P) = \log(\sqrt{P})$ and then $$ \exp\left(\frac{1}{2} \log(P) + \frac{1}{2} \log(Q)\right) = \exp\big( 0_{d \times d} + \log(\sqrt{P}) + \log(\sqrt{Q})\big) $$ could be interpreted in the above sense as the matrix in $\mathcal S_+^d$ with kernel $\ker(\sqrt{P}) + \ker(\sqrt{Q})$ and who, on $\big( \ker(\sqrt{P}) + \ker(\sqrt{Q}) \big)^{\perp} = \ker(\sqrt{P})^{\perp} \cap \ker(\sqrt{Q})^{\perp}$, is equal to $\exp(0_{d \times d}) = \text{id}_{d \times d}$ and thus somehow an orthogonal projector onto $\big( \ker(\sqrt{P}) + \ker(\sqrt{Q}) \big)^{\perp}$. For singular matrices $P, Q \in \mathcal S_+^d \setminus \mathcal S_{++}^d$ we could still define $M$ to be the orthogonal projector onto $\big( \ker(\sqrt{P}) + \ker(\sqrt{Q}) \big)^{\perp}$, but I don't know if this is a sensible definition.

Response to the answer by Igor Rivin. Is the following limit what you had in mind and is there a way to evaluate it? $$ \lim_{\varepsilon \searrow 0} \exp\left( \frac{1}{2} \log(P + \varepsilon I) + \frac{1}{2} \log(Q + \varepsilon I)\right) = \exp\left( \lim_{\varepsilon \searrow 0} \frac{1}{2} \log(P + \varepsilon I) + \frac{1}{2} \log(Q + \varepsilon I)\right), $$ where the equality is due to the continuity of the matrix exponential.

2

There are 2 best solutions below

11
On

I believe that the main principle is that $\exp(\frac12 \log x)= \sqrt{x}$ even for $x$ equal to $0.$ This is true by analytic continuation, and exactly the same idea works here - perturb your matrices slightly, and then check that the limit does not depend on the perturbation.

9
On

The following is pure speculation on my part as to how one might define the geometric mean for matrices which are not positive definite...

First, the log-based $M(P,Q)\,$ is not the same as the conventional matrix geometric mean $$ G(P, Q) := P^{\frac{1}{2}} \left(P^{-\frac{1}{2}} Q P^{-\frac{1}{2}}\right)^{\frac{1}{2}} P^{\frac{1}{2}} $$ for non-commuting matrices.

The conventional definition requires that at least one of the matrices be invertible. The log-based definition seems to require that both matrices are invertible.

Note that the Arithmetic-Harmonic Mean (AHM) is equal to the geometric mean for non-zero scalars. This suggests the following algorithm for the matrix case $$\eqalign{ A_0 &= P \quad &H_0 = Q \\ A_{k+1} &= \tfrac 12(A_k+H_k) \qquad &H_{k+1} = 2\big(A_k^++H_k^+\big)^+ \\ }$$ The iteration converges $\big(A_\infty=H_\infty\big)\,$ and doesn't require matrix inverses or logarithms or square roots. Instead it uses pseudoinverses which are well-defined for semidefinite matrices. The algorithm is also unaffected if $P$ and $Q$ are swapped.

Update

Generating small random matrices ($P$ is definite and $Q$ is semi-definite and $P Q \ne Q P$) $$\eqalign{ P &= \left[ \begin{array}{ccc} 70 & 64 & 73 \\ 64 & 93 & 71 \\ 73 & 71 & 98 \\\end{array} \right],\qquad\qquad Q = \left[ \begin{array}{ccc} 113 & 63 & 111 \\ 63 & 50 & 51 \\ 111 & 51 & 117 \\ \end{array} \right] \\ \\ G(P,Q) &= \left[ \begin{array}{ccc} 81.4967 & 50.6175 & 76.2631 \\ 50.6175 & 47.0073 & 35.975 \\ 76.2631 & 35.975 & 79.7012 \\ \end{array} \right] \\ \\ M(P,Q) &= \left[ \begin{array}{ccc} 97.7269-0.1884\mathit{i} & 65.5237-0.0035\mathit{i} & 91.3427-0.0008\mathit{i} \\ 65.5237-0.0035\mathit{i} & 59.8803+0.0988\mathit{i} & 49.7925+0.1196\mathit{i} \\ 91.3427-0.0008\mathit{i} & 49.7925+0.1196\mathit{i} & 93.5964+0.1455\mathit{i} \\ \end{array} \right] \\ \\ A(P,Q) &= \left[ \begin{array}{ccc} 84.0641 & 58.0756 & 84.0195 \\ 58.0756 & 71.2391 & 61.8203 \\ 84.0195 & 61.8203 & 107.4125 \\ \end{array} \right] \quad \Big\{6{\rm\;iterations:\;}\varepsilon\le 10^{-14}\Big\} \\ \\ }$$ So $\;M(P,Q) \;\ne\; G(P,Q) \;\ne\; A(P,Q)$.

When $P$ and $Q$ are both full-rank, the results become
$\quad M(P,Q) \;\ne\; G(P,Q) \;\equiv\; A(P,Q)$.

Update 2

When $PQ=QP\,$ numerical experiments indicate that when both matrices are singular $$G(P,Q) \;\ne\; M(P,Q) \;=\; A(P,Q)$$ otherwise $$G(P,Q) \;=\; M(P,Q) \;=\; A(P,Q)$$

To summarize, numerically there appear to be six cases for semi-definite matrices $$\eqalign{ \def\m#1{\left[\begin{array}{r|r|r|l}#1\end{array}\right]} \m{ &P^{-1}\,\&\&\,Q^{-1} &P^{-1}\,\|\,Q^{-1} &{\rm neither\:exists} \\ \hline PQ=QP & M=A=G & M=A=G & M=A \\ PQ\ne QP & A=G & & \\ } }$$