Why does the space of SPD matrices form a differentiable manifold?

1.7k Views Asked by At

Disclaimer: I am not a mathematician, just a young neuroscientist trying to understand a paper. So, please forgive me if I have horribly misunderstood something.


In order to understand this paper$^\color{magenta}{\star}$ that involves using a Riemannian kernel for a support vector machine, I am trying to understand some of the maths behind part of the methods. I am still not entirely sure what a differentiable manifold is, despite having watched a number of video lectures on the topic (It's a topological space which is locally homomorphic to Euclidean space - with continous transfer functions?). But more importantly: the paper states that the space of symmetric positive definite matrices forms a differentiable manifold. This seems important as the we need to calculate the distances between covariance matrices (which by definition live in the SPD space) in order for the SVM to perform well.

So, to me, it seems that the author's of the paper achieved impressive SVM classification results because their similarity (or distance) measure was more suited to the space in which they were working. ie a Riemannian kernel in a differentiable manifold. Is it just a happy coincidence that the space of SPD matrices forms this.. mathematical object within which we can perform better similarity measures?

I suppose my entire confusion can be summed up as: Why does a collection of a specific type of matrix (SPD) form this other topological space with specific properties that make it more efficient for measuring distances between points (matrices) in the space?


$\color{magenta}{\star}$ Alexandre Barachant, Stéphane Bonnet, Marco Congedo, Christian Jutten, Classification of covariance matrices using a Riemannian-based kernel for BCI applications [hal-00820475], Neurocomputing, Volume 112, July 2013.

2

There are 2 best solutions below

2
On

The set of symmetric matrices is a finite dimensional vector space. Any finite dimensional vector space is a manifold (just choose a basis to obtain a global chart). Let us denote the set of symmetric $n\times n$ matrices by $S$.

Now to be a positive definite matrix you will need that all the eigenvalues are positive. But this is an open condition: If a matrix $M$ has strictly positive (real part) eigenvalues, so do the eigenvalues of all nearby matrices. This means that SPD matrices form an open subset of the space of symmetric matrices. Open subsets of manifolds are manifolds.

0
On

Although the existing answer addresses the question asked in the title, I think that the author's intended query went unaddressed because they didn't express it clearly, i.e. Is it just a happy coincidence that the space of SPD matrices forms this.. mathematical object within which we can perform better similarity measures?

So, I'm going to answer the conceptual question of why using the SPD manifold for this classification task led to an improvement in performance.

The answer is that the choice of SPD manifold was not capricious, but it was determined by the structure of the data. As described in the paper, it seems like a standard way to process EEG signals for analysis, is to take the signals corresponding to a time interval and use them to compute a covariance of the different EEG signals within that interval. This way, each time interval ends up with an associated covariance matrix of eeg signals. Classification of the time intervals into classes is theb done using their associated covariance matrices.

Why people would use covariance matrices for classification rather than the raw signals is something that is probably explained by the idiosyncrasies of the problem. However, the classification is done on this very peculiar type of matrix, which are SPD matrices (all covariance matrices are SPD, and any SPD matrix can be thought of as a covariance matrix).

This type of matrices to which the data in the problem belong, form a manifold with a rich geometrical structure. Taking this geometrical structure into account when doing classification can give a more meaningful picture of how your data is distributed. And that is what the authors do, pointing out that previous approaches just proceeded without taking into account this rich geometry in the data, that exists independently of whether you choose to use it or not.

In sum, the use of an SPD manifold approach works better because this is the manifold where the data in the problem (i.e. covariance matrices) live. It's not a happy coincidence that the authors just happened to choose this manifold, but it is well known that covariance matrices belong to the SPD manifold. Just like for analyzing the distribution of points on the earth surface we would use the 3D Sphere manifold, and it would provide a better account of the data that not taking this geometry into account, and this wouldn't be a happy coincidence either.

This is all very nicely explained in the geomstats package website https://geomstats.github.io/notebooks/00_foundations__introduction_to_geomstats.html.

(PD I'm also a neuroscientist learning about manifolds)