Creating a network based on arrays of numbers

49 Views Asked by At

Intro:
I am not a mathematician, but I am a bit familiar with the graph theory. I am also comfortable with Python and NetworkX.

Example data:
Array A: [10, 20, 30, 40, 50]
Array B: [20, 30, 40, 50, 90]
Array C: [100, 200, 300, 400, 500]

I would like to create a network based on similarities of the elements in each array. By intuition, the values in array A are more closely related to array B, and in turn to array C. In some cases, I also want to weight certain elements more, where the last element could be more important than the earlier ones.

Why I think the network approach is suitable?
The number of pairwise comparison needed is the factorial of the number of arrays. For a large number, this will simply become computationally infeasible.

Question:
Am I correct in my thinking that I can use a network approach to find pairs that are closely related to each other? If yes, how do I go about creating the network using the example data above?

Thank you.

1

There are 1 best solutions below

0
On

I guess the amplitude of your values and their position matter, and maybe the sequences all have the same length like in your example. Then, it may make sense to see them as time series, and use one of the various approaches to measure similarity between time series to build a graph.

If order does not matter, then you may measure similarity between sequences through their value distributions, either with something like a KS test, or by comparing their parameters (mean and variance, e.g.). If amplitude does not matter, you may normalize the sequences.

You may also consider edit distances: how many modifications (like removing or adding an entry, rescaling one, swapping two entries, or shifting or rotating the array, e.g.) are required to transform one array into another. The cost of each modification may be weighted.

Clusters in such graphs, and other properties like their degree distributions, would certainly say something on your set of sequences, although it may be difficult to interpret.

The approaches above exist in the literature, and I suggest searching the web for the highlighted keywords.

Hope this helps!