Measure for Change of Network after adding one edge

1.3k Views Asked by At

I have an undirected network. I want to have a measure which tells me how much adding a certain edge changes the network.

Please have a look at this example:

enter image description here

Here, the black edges represent the original network. It is obvious that by adding the green edge, the network structure will not change a lot. However, if I add the red edge, the network changes significantly.

Are there known methods to study such changes?

I was thinking about a measure $m(N_{orig},N_{new})$ (where $N_{orig}$ and $N_{new}$ stands for the original and modified network, respectivly) that takes into account the distance $d$ between all vertices:

$$ m(N_{orig},N_{new})=\sum_{\forall (V_i,V_j)} \left(d_{orig}(V_i,V_j) - d_{new}(V_i,V_j) \right) $$

where $V_i$ are the vertices of the network. In the example network above, $m(N_{orig},N_{black})=2$ and $m(N_{orig},N_{red})=90$. It is similar to the difference of the average path length, as indicated by DavidV in the comments.

This would be great, but the network distance is expensive to calculate for large networks (roughly $\mathcal{O}(V^2\cdot\log(V)+V\cdot E)$ for the Johnson's algorithm or $\mathcal{O}(V^3)$ for the Floyd–Warshall algorithm). My network has V~=5.000 and E~=900.000. As I want to calculate many added edges, this measure seems to be too expensive for me.

I'm happy about every suggestion! Thanks

2

There are 2 best solutions below

2
On

Have you considered the vertex-connectivity of a graph?

It equals the minimum number of vertices to remove before you can hope disconnecting the graph in separate components. For example, for trees, the vertex-connectivity equals $1$, because removing any node will necessarily disconnect the graph.

It is polynomial time computable, which is interesting.

In your example, if you remove the middle node, you disconnect the graph, so it has vertex-connectivity $1$. Adding the green edge clearly does not affect the connectivity, but adding the red one does, as deleting the middle node no longer disconnects the graph with this extra edge. Therefore, this could be a relevant measure for your problem.

Edge-connectivity is defined in the same fashion and could also be used.

15
On

A simple way to measure how different the network is after adding an edge is to find the length of the original shortest path took to get from the one of the vertices to which the edge is attached to the other vertex.

In your example, the shortest path between the two vertices connected by the green edge before you added the green edge was two edges long. Before adding the red edge, the shortest path between the two nodes was seven edges long. After adding the edge, the shortest path for any two vertices becomes one.

The basic idea here is that the larger the path between two vertices, the more the graph will change when an edge is added.

All you really need is a pathfinding algorithm, so this measurement shouldn't be too slow.

Edit: As NicoDean pointed out, this alone does not take neighboring vertices into account. I think I have a way around this. This is going to be a pretty rough way to measure the change, but you should be able to modify it to make it work. I will use his graph as an example, and I will label all the nodes from left to right alphabetically, so the red line will connect nodes $A$ and $J$, and the green line will connect nodes $H$ and $L$. NicoDean's Graph with Labeled Nodes When connecting two nodes that are indirectly connected, you will form a cycle, in which the path length from any node in the cycle to another node will either remain the same or be however long the original path between the two connected nodes was minus the path length between that node to one of the connected nodes plus one. In your example, adding the red line creates the cycle $\{A, E, D, F, G, I, H, J\}$. Before adding the red line, the shortest path from $E$ to $J$ was $6$, now it is $7-6+1=2$. You can then calculate the difference between these two values to find the improvement. Do this for every point on the cycle and add it to some running total. That deals with the cycles, and all you really need is the length of the cycle. In your graph, the red edge affected everything, so that messed me up because all you needed to measure the change was the length of the line. The second thing to note is that adding a line only affects paths that use the line. For instance, the shortest path from $K$ to $H$ would include $IH$, but adding the red edge has no effect on the path length.

The last thing you need to consider is which nodes of the cycle the outer nodes are connected to. From there, you should find the length of all the shortest paths from that outer node to any point in the cycle before and after adding the edge. This algorithm should start with the nodes that are directly connected to a node in the cycle. For instance, if you consider $B$, it is connected to both $A$ and $D$. The shortest path lengths from $A$ and $D$ to any node in the cycle after inserting the red edge are listed below. $$(A, E) = 1; (D, E) = 1$$ $$(A, F) = 3; (D, F) = 1$$ $$(A, G) = 4; (D, G) = 2$$ $$(A, I) = 3; (D, I) = 3$$ $$(A, H) = 2; (D, H) = 4$$ $$(A, J) = 1; (D, J) = 3$$ The shortest path from $B$ to any of the nodes in the cycle is the minimum of the two numbers up there plus one, so: $$(B, E) = 2$$ $$(B, F) = 2$$ $$(B, G) = 3$$ $$(B, I) = 3$$ $$(B, H) = 3$$ $$(B, J) = 2$$ Now, these are the lengths before adding the edge: $$(A, E) = 1; (D, E) = 1$$ $$(A, F) = 3; (D, F) = 1$$ $$(A, G) = 4; (D, G) = 2$$ $$(A, I) = 5; (D, I) = 3$$ $$(A, H) = 6; (D, H) = 4$$ $$(A, J) = 7; (D, J) = 5$$ After choosing the shortest lengths, this is what we get for $B$ before adding the edge: $$(B, E) = 2$$ $$(B, F) = 2$$ $$(B, G) = 3$$ $$(B, I) = 4$$ $$(B, H) = 5$$ $$(B, J) = 6$$ We can then measure the improvement by subtracting the before and after: $$(B, E) = 0$$ $$(B, F) = 0$$ $$(B, G) = 0$$ $$(B, I) = 1$$ $$(B, H) = 2$$ $$(B, J) = 4$$ The total improvement for adding the edge on the path length of B is therefore $1+2+4=7$. You then do the same process for all the nodes that have at least one edge connected to the cycle and add that to the running total for the improvement. Then, you do the same process for all the nodes connected to those initial nodes, calculate the improvement, and continue on until you have calculated the total improvement. The greater the improvement, the larger the change. Please comment if anything is unclear to you or you see any errors in my reasoning. This algorithm has the basic idea down, but you should be able to use it as a basis for the algorithm you will use. The only problem I can see with this algorithm is that if you have a preexisting cycle, adding an edge between any two nodes in that cycle will split it up into two cycles. This doesn't really seem to be a problem.

Edit: Another minor problem I realized is that if you have two connected vertices that are the same distance from the cycle, the improvement may have a circular dependency. This can be solved by not considering that edge until all the other edges of the two nodes have been considered. Then, you can run the improvement algorithm on both of the edges, making sure to add one to all the other node's edge lengths.

Optimization: Any node that does not see any improvement might not need to be considered when dealing with nodes attached to it.

This should have a runtime of something like $O(E+V)$ for the simple breadth-first search plus $O((E-L)*L^2)$, where $E$ is the total number of edges and $L$ is the length of the cycle, as you need to find the shortest path length for each node in the cycle, which is where you get the $L^2$, for every single edge outside the cycle, which is where you get the $E-L$. It looks like $L$ should be relatively low because of the ratio of the number of edges to the number of edges you would need to have a complete graph is pretty high.

Actual Running of the Algorithm I am going to use my algorithm to calculate what I call the "improvement" on the graph for both the red line and the green line. I will do the red line first, as it forms a singular cycle while the green line forms three new cycles.

The first thing we do is find the shortest path between the two nodes and we calculate its length ($7$). This will take $O(V+E)$ time. The next thing we need to do is calculate the path length for every node that will be in the cycle before the edge is added. That step is extremely simple, as you only need to consider the edges in the cycle. The next thing we do is calculate the shortest path length for all nodes in the cycle not within the ceiling of half of the length ($\lceil{\frac7 2}\rceil=4$) of the two connected nodes after adding the edge. We know that we only need to consider these nodes because only these nodes will experience a change in the shortest path. In this example, I will use the notation $(X,Y)=k$ to denote the length of the shortest path between $X$ and $Y$.

Before $$(A,I)=5;*(E,I)=4;*(D,I)=3$$ $$(A,H)=6;(E,H)=5;*(D,H)=4$$ $$(A,J)=7;(E,J)=6;(D,J)=5$$

After $$(A,I)=3;*(E,I)=4;*(D,I)=3$$ $$(A,H)=2;(E,H)=3;*(D,H)=4$$ $$(A,J)=1;(E,J)=2;(D,J)=3$$

The starred nodes experience no change and are only there for comparison purposes later in the algorithm. There are tons of patterns in this that you should be able to figure out just from looking at the graph, which you can probably use to optimize. You then add up the total improvement, then double it because the path improvements apply in both directions. The total improvement so far is $2*((3*(5-3))+(2*(6-2))+(1*(7-1))=36$. From there, you calculate for the nodes that are not part of the cycle but touch the cycle. Starting with $B$, you compare the shortest paths between all the nodes it connects to on the cycle, which are $A$ and $D$.

Before $$(A,I)=5;(D,I)=3\implies(B,I)=4$$ $$(A,H)=6;(D,H)=4\implies(B,H)=5$$ $$(A,J)=7;(D,J)=5\implies(B,J)=6$$

After $$(A,I)=3;(D,I)=3\implies(B,I)=4$$ $$(A,H)=2;(D,H)=4\implies(B,H)=3$$ $$(A,J)=1;(D,J)=3\implies(B,J)=2$$

The improvement for $B$ is therefore $(4-4)+(5-3)+(6-2)=6$. The total improvement is now $36+6=42$. Now we consider $C$, which is connected to $D$ and $F$. Because the only path that improved was the $(D,J)$, we only need to consider the $(C,J)$ path.

Before $$(D,J)=5;(F,J)=4\implies(C,J)=5$$

After $$(D,J)=3;(F,J)=4\implies(C,J)=4$$

The improvement of $C$ is one; the total improvement is $42+1=43$. The next node to consider is $K$. This is where things get interesting. Instead of finding $(K,J)$, we start finding $(K,A)$ because $K$ is connected to $I$, which has no $(I,J)$ path that we need to check. Also, we need to consider that $K$ connects to $M$, which is not part of the cycle. To do this, we ignore that edge until we calculate the distance change for both $K$ and $M$ so that we can compare them. For $K$, we only need to consider the $(A,I)$ path before and after.

Before $$(A,I)=4\implies(K,A)=5$$ After $$(A,I)=3\implies(K,A)=4$$

$K$ improves by one, so the total improvement is now $44$. Now we consider $M$, which also has an edge that we will ignore for now.

Before $$(A,H)=6\implies(M,A)=7$$ $$(E,H)=5\implies(M,E)=6$$ After $$(A,H)=2\implies(M,A)=3$$ $$(E,H)=3\implies(M,E)=4$$

The total improvement is now $50$. We now compare the path lengths for $K$ and $M$ to see if there is a shorter path through either of them. You would redo $K$ considering $M$ as part of the cycle and vice versa. In this case, there is no shorter path, but if $M$ was connected to $J$, you would see an improvement. Finally we consider $L$, which is connected to both $I$ and $J$.

Before $$(A,I)=5;(A,J)=7\implies(L,A)=6$$ $$(E,I)=4;(E,J)=6\implies(L,E)=5$$ $$(D,I)=3;(D,J)=5\implies(L,D)=4$$ After $$(A,I)=3;(A,J)=1\implies(L,A)=2$$ $$(E,I)=4;(E,J)=2\implies(L,E)=3$$ $$(D,I)=3;(D,J)=3\implies(L,D)=4$$

This gives us a total improvement of $56$. Finally, we need to consider the $LM$ edge. This edge does not produce an improvement. Our total improvement is $56$.

Now, for the green edge, we need to consider the three cycles it forms. Once we do that, we need to see if there is any improvement in any of the cycles. We see that there is no improvement anywhere in the cycles except the nodes $H$ and $L$, which see an improvement of $1$ each, giving us a total improvement of $2$. Since none of the nodes in the cycle that touch nodes outside the cycle see any improvement, we can stop the algorithm here for a final improvement of $2$.