Notation of a graph clustering operation

105 Views Asked by At

I am having trouble figuring out how to mathematically write down a certain operation on a graph. I have programmed the operation so I know how it works- I just don't know how to write it in the language of mathematics. To make clear what I am doing exactly, I will instead write it down using an example in JavaScript. After that, a hand-drawn picture demonstrating again what I want to do will be added, and finally I'll post the notation I have so far.

Imagine a graph with some nodes and edges. The nodes have a numerical ID and the edges are ordered (because direction is important) pairs of the the node IDs. For example:

let nodes = [1, 2, 3, 4, 5]
let edges = ['1-3', '1-4', '2-1', '2-3', '2-4', '3-2', '3-5', '4-2', '4-5']

Both nodes and edges have weights. This weight is stored in an object with their identifier as key and their weight as value.

let nodeWeights = {1: 20, 2: 30, 3: 25, 4: 50, 5: 45}
let edgeWeights = {
  '1-3': 3,
  '1-4': 5,
  '2-1': 2,
  '2-3': 9,
  '2-4': 4,
  '3-2': 1,
  '3-5': 2,
  '4-2': 8,
  '4-5': 6
}

Our algorithm decides that nodes 3 and 4 must be in the same cluster and generates a new ID for the new cluster:

let clusters = {
  1: [1],
  2: [2],
  5: [5],
  6: [3, 4]
}
let mergeHash = { 3: 6, 4: 6 }

Now the part where the confusion starts: I want to add up the weights of the nodes that are now clustered (in this example just 3 and 4). This is simple:

let newNodes = []
let newNodeWeights = {}
for (let key in clusters) {
  newNodes.push(key)
  newNodeWeights[key] = 0
  for (let nodeID of clusters[key]) {
    newNodeWeights[key] += nodeWeights[nodeID]
  }
}

We just loop through the newly defined clusters which become our new nodes. The weight of our new nodes is the sum of the weights of the old nodes of which the new cluster consists. This results in newNodes containing [1, 2, 5, 6], and newNodeWeights containing {1: 20, 2: 30, 5: 45, 6: 75}

For the edges, we'll do the following:

let newEdges = []
let newEdgeWeights = {}
for (let edgeKey of edges) {
  let [i, j] = edgeKey.split('-').map(x => parseInt(x))
  if (mergeHash[i]) {
    i = mergeHash[i]
  }
  if (mergeHash[j]) {
    j = mergeHash[j]
  }
  let newKey = i + '-' + j
  if (!newEdges.includes(newKey)) { newEdges.push(newKey) }
  if (newEdgeWeights[newKey]) {
    newEdgeWeights[newKey] += edgeWeights[edgeKey]
  } else {
    newEdgeWeights[newKey] = edgeWeights[edgeKey]
  }
}  

Basically, if we have two edges that start at the same node (i) and they go to two old nodes that are now in one cluster, we add them together into one edge. Same for two edges that terminate in the same node (j) but start from two nodes that are both in the same cluster.

This would result in newEdgeWeights being:

{
  "1-6": 8,
  "2-1": 2,
  "2-6": 13,
  "6-2": 9,
  "6-5": 8
}

You can play around with the code yourself here: https://codepen.io/anon/pen/qKoyaR/?editors=1111


The same operation drawn out looks like this. Before:

Before merging 3 and 4

After:

After merging 3 and 4


The math I have so far is this:

Let $G$ be a weighted, directed graph with a set of nodes $N$ and a set of edges $E$.

$$ N = \{ N_{1}, N_{2} .. N_{n} \} $$ $$ E = \{ E_{ij} \mid i \in N, j \in N \} $$

Let $C$ be a set of clusters $C_{s}$, containing the indices of the nodes inside that cluster, each having $c_{s}$ members. For example:

$$ C_{1} = \{ 1, 3, ..c_{1} \}, C_{2} = \{ 2, 5, ..c_{2} \}, ... C_{s} = \{ ... c_{s} \} $$

$$ C = \{ C_{1}, C_{2} ... C_{s} \} $$

The new set of weighted nodes becomes:

$$ N^{clustered} = \left\{ \sum_{c_{s} \in C_{s}}{N_{c}} \mid C_{s} \in C \right\} $$

Here I already feel like my notation is wrong, and I have completely no idea where to even begin with the calculation for the edges. How should I do this? Create separate sets for the indices and weights, like in the code? Because I never really see that being done anywhere in the literature...

1

There are 1 best solutions below

1
On BEST ANSWER

There are various equivalent ways to formalize this mathematically. I'll just give one I think is conceptually simplest.

Traditionally, a graph is a set of nodes $N$ (e.g. $N=\{1,2,3,4,5\}$), and a set of edges $E\subseteq N\times N$, each edge being a pair of nodes (e.g. $E=\{(1,3), \dots, (4,5)\}$). To make it a weighted graph, you define two functions $w_N:N\to\mathbb R$ and $w_E:E\to\mathbb R$ assigning a weight to each node and each edge (e.g. $w_N=\{1\mapsto20,\dots,5\mapsto45\}$ and $w_E=\{(1,3)\mapsto3,\dots,(4,5)\mapsto6\}$).

Additionally, you've defined a set of clusters $\mathcal C\subseteq\mathcal P(N)$ whose elements are subsets of $N$ (e.g. $\mathcal C=\{\{1\},\{2\},\{5\},\{3,4\}\}$).

The clustered graph is most easily seen as a graph over the clusters themselves. That is, you can choose the new node set to be simply $N' = \mathcal C$, and define the new edge set as $$E'=\{(C_i,C_j):\exists n_i,n_j\text{ s.t. }n_i\in C_i,n_j\in C_j,(n_i,n_j)\in E\},$$ i.e. two clusters are connected if and only if there is a node in one that is connected to a node in the other. To carry over the weights, you need to define functions $w'_N:N'\to\mathbb R$ and $w'_E:E'\to\mathbb R$, for which the natural definitions are $$\begin{align} w'_N(C_i) &= \sum_{n_i\in C_i}w_N(n_i), \\ w'_E(C_i,C_j) &= \sum_{\substack{n_i\in C_i,\\n_j\in C_j,\\(n_i,n_j)\in E}}w_E(n_i,n_j). \end{align}$$

I believe the approach in your code is similar to this, except you have chosen a different node set $N'=\{1,2,5,6\}$ which can be seen as labels for the clusters, along with lookup functions $N'\to\mathcal C$ and $N\to N'$.