Calculation of Shortest Paths in a Directed Graph takes much longer than calculating Betweenness Centrality

26 Views Asked by At

First of all, I am trying to calculate the Betweenness Centrality of a fully connected directional graph with edge weights with N=3015 nodes. Matlab can do this in about 30 seconds and Python igraph can do it in a minute while Networkx takes hours as it is written in python and so I have abandoned it.

Second, I need to get a list of all of the shortest paths (those that minimize weights along path) between all pairs of nodes. In matlab there is a function to do this called shortest path. There is also a similar function in igraph called get_shortest_paths. I can only get one path back from each call.

So I loop over all $N^2$ pairs of nodes and extract the shortest path, store it in an array and then continue.

The problem is that while I can get betweenness centrality in under a minute, extracting all of the shortest paths takes hours in both matlab and igraph.

This seems strange given that betweenness centrality is based on the calculation of all of the shortest paths. So why does the extraction of the shortest paths take so much longer ?

1

There are 1 best solutions below

2
On

This is because you’re computing everything from scratch for each pair, whereas an efficient computation of the shortest paths between all pairs reuses a lot of information. In MATLAB you can use shortestpathtree to get all shortest paths from a given vertex in one go. In igraph, the method get_shortest_paths that you’re using accepts the value None for the parameter to to indicate that the shortest paths to all vertices should be calculated. Since you write that you need to get a list of all the shortest paths, you should also take a look at get_all_shortest_paths (which also accepts None for to). The documentation isn’t a model of clarity; if I’m interpreting it correctly, the difference is that get_shortest_paths calculates a single shortest path per target vertex and get_all_shortest_paths calculates all of them (so the plural ‘s’ in get_shortest_paths seems to refer to the potentially plural target nodes, in contrast to get_shortest_path, which only accepts a single target node).

I would expect that computing all shortest paths for all vertex pairs in one go would be even more efficient, but this should already significantly reduce the total computation time.