What does near-cut (threshold) graph say about original complete weighted graph?

121 Views Asked by At

ORIGINAL post, preamble

Start from a complete graph with weighted edges (e.g. in $[0,1]$ interval). Continuously increasing threshold $t$ from $0$ to $1$ drop edges with weights less than $t$. Finally stop right before the graph would become disconnected. See animation. Consider remaining connected graph - call it "threshold graph". What does its properties/structure say about the original complete graph? It is probably related to graph theory edge cut and max-flow min-cut theorem. But those are based on the minimal set of edges to disconnect a graph. What I really would like to understand in my case if any properties of the "threshold graph" say something mathematically interesting about the original complete graph. I especially interested in communities and clusters, but anything else interesting too.

Such question often appears in problems with a complete set of relationships in a finite set - like a group of people, stocks in a portfolio, etc. Any thoughts and help are highly appriciated!

enter image description here

This basically similar to "percolation" or "watershed" problems. Here is very simplistic formulation of it.

UPDATE: simple clean problem

  1. N interacting agents - anyone with everyone - complete graph
  2. Interactions varies in strength - reflected in edge weights
  3. Drop all edges weaker than a particular weight - "threshold" - and replace the edges you kept with same uniform weight
  4. This is a simple skeleton graph in which graph communities cam be studied
  5. The BIG question is - "how to choose cut-off threshold" ?
  6. Threshold type 1:
    • agents represented by time series of data (say stock price)
    • edge weights are Pearson correlation coefficient between time series
    • cut-off threshold is typical "strong" Pearson value - say 0.75 or similar
    • such cut-off is typical but pretty vague and subjective - it is not dictated by data
  7. Threshold type 2:
    • agents represented by time series of data (say stock price)
    • edge weights are Pearson correlation coefficient between time series
    • cut-off threshold is "the last MAX edge we drop while graph is still connected"
    • such cut-off is naturally arises from data
    • i guess this is important for problems where skeleton threshold graph must still connect all agents

THE QUESTION: in what cases (7) should be chosen over (6) and what metrics we can drive from it, especially towards communities/structure? What mathematical reasoning can give preference to (7) over (6)?

For practical example please see:

https://blog.wolfram.com/2012/06/01/graph-theory-and-finance-in-mathematica