Finding paths that satisfy criteria in a large weighted cyclic directed graph

108 Views Asked by At

I have a large cyclic weighted directed graph where weight is the duration in days of going from one node to another. The graph has about 1500 nodes and 100 thousand edges, and the average node in/out degree is 60, so it has high connectivity and many cycles. Each edge has a parameter, how many times it can be used in one path.

The task is to get all possible paths with a total weight less than a given number n as the time limit in days.

Example of graph with 5 nodes: enter image description here

You can play with it:

import networkx as nx
import json

node = 1
max_period = 50

edges_str = '[[1, 2, {"cnt": 1, "duration": 5}], [1, 5, {"cnt": 1, "duration": 6}], [1, 3, {"cnt": 1, "duration": 29}], [1, 1, {"cnt": 1, "duration": 14}], [1, 1, {"cnt": 10, "duration": 11}], [1, 4, {"cnt": 1, "duration": 1}], [2, 4, {"cnt": 1, "duration": 22}], [2, 1, {"cnt": 1, "duration": 11}], [2, 2, {"cnt": 1, "duration": 4}], [2, 5, {"cnt": 1, "duration": 23}], [2, 3, {"cnt": 1, "duration": 15}], [3, 4, {"cnt": 1, "duration": 19}], [3, 1, {"cnt": 1, "duration": 24}], [3, 2, {"cnt": 1, "duration": 16}], [3, 5, {"cnt": 1, "duration": 27}], [3, 3, {"cnt": 1, "duration": 5}], [3, 3, {"cnt": 24, "duration": 29}], [4, 3, {"cnt": 1, "duration": 6}], [4, 3, {"cnt": 5, "duration": 4}], [4, 1, {"cnt": 1, "duration": 19}], [4, 4, {"cnt": 1, "duration": 27}], [4, 2, {"cnt": 1, "duration": 24}], [4, 5, {"cnt": 1, "duration": 13}], [5, 4, {"cnt": 1, "duration": 6}], [5, 1, {"cnt": 1, "duration": 9}], [5, 2, {"cnt": 1, "duration": 2}], [5, 5, {"cnt": 1, "duration": 15}], [5, 3, {"cnt": 1, "duration": 23}], [5, 3, {"cnt": 12, "duration": 28}]]'
edges = json.loads(edges_str)
graph = nx.MultiDiGraph()
graph.add_edges_from(edges)

My current solution is just bfs with keeping all matching paths in a cache, which I can use to calculate the weight of the new path.

It works well on a 20 day limit (20s for 6 concurrent processes).

But for the 40 day limit, I can't get the result within 2 hours on 50 concurrent processes.

Could you give me some advice to guide me towards a solution?

Now I see several possible steps:

  1. Assess the impact of cycles. I can transpose an edge with a limit of n passes to n different edges.
  2. Try to use more reasonable cache storage with path subsequence reuse.
  3. Reorganize the graph to a line graph to make it simpler, but bigger.
  4. Try to move the task to the LP area.
  5. Reframe the problem to see a simpler solution
1

There are 1 best solutions below

3
On

Let $w_{ij}$ be the weight of edge $ij$. We assume $w_{ij} \geq 0$ since it is a duration.

If you don't mind about start and end of the path, just run a DFS from every node, and backtrack every time the total path length is above your threshold. If you visit neighbors by increasing $w_{ij}$, you can backtrack just after visiting the first neighbor which exceeds the threshold.

At every node of the DFS, at most one comparison will fail, and each node will lead to at least one valid path, so you will never do useless searching. The complexity will be linear in the number of valid paths.

If you only want paths between given start and end vertices, run a simple source shortest-path algorithm to find the distance $d_i$ between $s$ and $i$ for every vertex $i$. For each vertex $i$, sort in-neighbors $j$ by increasing value of $d_j + w_{ji}$, and use this ordering as a lexicographic ordering when visiting the neighbors of $i$.

Then do a DFS from the end node, going through reverse edges. backtrack whenever distance traversed + distance to start is lower than your threshold. When visiting neighbors of a node, you can backtrack at the first neighbor which gives a distance above the threshold.