Why do we "vectorize" persistence diagrams?

543 Views Asked by At

Recently I've been going through some papers and tutorials on using persistent homology in machine learning and pretty quickly, when all algebraic topology stuff ended, I've found that, in order to use topological information in machine learning models(e.g. SVC), one needs to get a vectorial representation of the persistence diagram. Why can't we just use the information of ($t_{birth}$, $t_{death}$) for every topological feature? Why do one have to use some techniques like persistence images or heat kernel in order to apply SVC or some other machine learning model?

I've seen some explanations like that in order to use most of ml models one needs to have features in some "good" space, e.g. Euclidean or Hilbert, but the set of persistence diagrams is just a metric space, therefore we go for vectorial representations, entropy and amplitudes. But I can't really see where do we really rely on the properties of such spaces in machine learning(the only examples I can only remember that relies on such notions are kernel tricks and hyperplane calculations for Support Vector Machine).

So, I can't fully figure out the reason behind this vectorial representations. I'd like to get some tips on it. Thanks.

1

There are 1 best solutions below

1
On BEST ANSWER

The set of persistence diagrams is a metric space, equipped with (say) the bottleneck distance or some p-Wasserstein distance. However, many algorithms in statistics or machine learning require more than just the structure of a metric space. Let's start with averages --- how do you take the average of a collection of persistence diagrams? As discussed in https://doi.org/10.1088/0266-5611/27/12/124007, the average of a collection of persistence diagrams need not be unique. For example, if one persistence diagram has two points (10,20) and (12,22), and another persistence diagram has two points (10,22) and (12,20), then the "average" of these two persistence diagrams could be either the diagram containing the two points (11,20), (11,22) or instead it could be the diagram containing the two points (10,21), (12,21) --- and hence averages are not unique. However, once you vectorize persistence diagrams, then you can take averages of any number of persistence diagrams. Similarly, let's say you wanted to compute variances (deviations from averages); this is greatly aided if you try to do so after taking vectorizations. What if you want a central limit theorem? I think the introduction to the persistence landscapes paper https://www.jmlr.org/papers/volume16/bubenik15a/bubenik15a.pdf explains well the need for vectorizations of persistence diagrams in order to enable more statistical tests and tools.

As you point out, some machine learning techniques (such as Support Vector Machines) require more than the structure of a metric space --- they often require inner products. Other such machine learning techniques that sometimes require more than a metric space structure are as follows. Decision trees often separate the data into classes in a hierarchical collection of small decisions, and each small decision is often given by a coordinate-aligned (hence the need for a vectorization) hyperplane. Neural networks are easiest to run when the inputs are all vectors of the same length, as in JHFs answer above. Feature selection (which persistence diagram points are the most discriminating?) algorithms often ask for vectorizations, as the selected feature is given as a coordinate of the input vector. Some (but certainly not all) dimensionality reduction algorithms ask for the input to be a collection of vectors, instead of a metric space. I'd recommend the introduction to the Persistence Images paper https://www.jmlr.org/papers/volume18/16-337/16-337.pdf for more perspectives or thoughts along these lines.

A minor comment is that vectorizing persistence diagrams as a list of birth and death coordinates --- perhaps augmented with one more entry giving the number of persistence diagram points --- does not naively give a stable invariant. Its instability, roughly speaking, arises as follows. Say one persistence diagram has the points (4,5), (6,11) and another persistence diagram has the points (4,5.1), (5.9,11), (1000,1000.1). These two persistence diagrams are super close in the bottleneck distance or in the p-Wasserstein metric. But if you vectorize these persistence diagrams as the vectors (4,5,6,11) and (4,5.1,5.9,11,1000,1000.1), then these two vectors are not close in most standard senses --- even senses that might allow you to compare vectors of different lengths.