Can we estimate the distinct paths given a group of nodes and connections

121 Views Asked by At

If we have a system of 30,000 nodes and 126,000 connections, where there is not necessarily a path from any given node to another, are we able to get an estimate for the number of distinct paths (or indirect connections) in the system?

To give a small example, say we have the following nodes and connections:

a <-> b

b <-> c

x <-> y

This gives us the following paths (total = 8):

a-b, a-c, b-a, b-c, c-b, c-a, x-y, y-x

Given a system of 30k nodes and 126k connections, can we estimate the total "paths". I assume that the actual number will depend highly on how interconnected all the nodes are but is there way to calculate based on an even distribution what the total number might be?

I am not a mathematician, so I'm sorry if my terminology is wildly off, but hopefully I have put across what I am looking for.

1

There are 1 best solutions below

0
On

I'm basing this on Jacob's rough explanation on the background of the question in the comments.

I'll describe what I'd to if this was a one-time effort (or only needed to be done infrequently like once a year).

First you need to create a list of all your nodes. Then you need to apply a graph traversal algorithm (https://en.wikipedia.org/wiki/Graph_traversal). I don't think you need anything more fancy than breath-first-search oder depths-first-search and I don't think it matters in this case which one you take. The only thing that needs to be done is to use data structures that make it easy to check if a given node is contained inside a possibly large 'list' of nodes (so use hash sets or something similar).

Starting with any node, this finds all nodes that are (directly or indirectly) connected to it. The list of nodes thus found (including the starting node) form your first connected component $C_1$.

If there are nodes not in $C_1$, take one of them as starting node and do it all again, producing $C_2$. Continue until your components contain all nodes.

In your database, add a field "Component" and make it 1 for all nodes in $C_1$, 2 for all nodes in $C_2$, etc. Then when you later want to check if 2 nodes are connected, just get their Component values and compare them: If they are the same, they are connected, otherwise they are not connected.