Likelihood of random spanning tree given distribution of edge weights?

345 Views Asked by At

Suppose I have a complete graph $G=(V,E)$ with $n$ vertices, whose edge weights $W\in\mathbb R_+^{n\times n}$ are drawn from some distribution $P(W)\in\mathrm{Prob}(\mathbb R_+^{n\times n})$. I randomly draw a set of edge weights $W\sim P$ and then compute the resulting minimum spanning tree $T\subseteq E$ of the weighted graph.

Is it possible to compute the probability of a given spanning tree $T$ given the distribution $P(\cdot)$?

A value proportional to this probability would suffice. If it helps, I'm happy to assume that each weight is chosen independently, i.e. $P(W)=\prod_{ij}P_{ij}(W_{ij})$, but I do not want to assume that the $W_{ij}$'s are iid. Even assuming each $P_{ij}$ is a Bernoulli (or exponential or Gaussian) distribution with a different parameter would be a great start.

2

There are 2 best solutions below

1
On

This is doable for the case where each $P_{ij}$ is an independent exponential with its own rate $\lambda_{ij}$. Well, I say doable; this is still going to be a sum over $(n-1)!$ cases.

Exponential distributions are nice because, together with Prim's algorithm for a minimum spanning tree, there is a nice model for how we get a minimum spanning tree. We think of the edge weight of edge $ij$ as the time at which that edge "appears".

Prim's algorithm says that we take the first edge that appears, and put it in our spanning tree. Then we take the second edge that appears, and put it in our spanning tree. Over time, some edges that haven't appeared are rejected before they appear, because they'd create cycles, but apart from that, we keep taking the first edge that appears until we create a tree.

Exponential distributions are memoryless, so that once the first $k$ edges have appeared, the remaining time until an edge appears is still the same distribution as it was at the beginning (conditioned on what has happened so far).


We're going to sum over all $(n-1)!$ orders in which the edges of $T$ can appear. For each of these orders $e_1, e_2, \dots, e_{n-1}$, we multiply together the following probabilities:

  1. The probability that $e_1$ is the first edge to appear. This is $\frac{\lambda_{e_1}}{\Lambda}$, where $\Lambda = \sum_{ij \in E(K_n)} \lambda_{ij}$.
  2. The probability that $e_2$ is the next edge to appear. This is $\frac{\lambda_{e_2}}{\Lambda'}$, where $\Lambda' = \Lambda - \lambda_{e_1}$ is the updated sum of all rates.
  3. The probability that $e_3$ is the next edge to appear, excluding any edge that would create a cycle with $e_1$ and $e_2$. This is $\frac{\lambda_{e_3}}{\Lambda''}$, where $\Lambda''$ is the updated sum of the rates of edges that can still become part of the spanning tree.
  4. And so on, until we get through all the edges.

For example, suppose we're looking at a $4$-vertex graph, and we want the tree consisting of edges $12$, $23$, and $34$. The probability that Prim's algorithm will add those three edges, in that order, is $$\frac{\lambda_{12}}{\lambda_{12} + \lambda_{13} + \lambda_{14} + \lambda_{23} + \lambda_{24} + \lambda_{34}} \times \frac{\lambda_{23}}{\lambda_{13} + \lambda_{14} + \lambda_{23} + \lambda_{24} + \lambda_{34}} \times \frac{\lambda_{34}}{\lambda_{14} + \lambda_{24} + \lambda_{34}}.$$ There'll be $5$ more similar-looking products for the other orders in which these edges can be added.


For large $n$ (and still independent edge weights), we can also solve the problem approximately in some other situations, by approximating the distributions with exponential distributions.

This works because (or to the extent that) the minimum-weight edge chosen at every step of Prim's algorithm will be much smaller than the typical weight of an edge. So the random weights will be "approximately memoryless" in the sense that the edge weights have barely anything to remember.

In this approximation, we want to replace each distribution $P_{ij}$ by the exponential distribution whose rate $\lambda_{ij}$ is $$ \lim_{t \to 0} \frac{\Pr[P_{ij} < t]}{t}. $$ The motivation for this formula is that $\Pr[P_{ij} < t]$ for very small $t$ is the only thing that will affect whether edge $ij$ gets used or not.

0
On

Here's an idea. I think this gives an efficient (aka polynomial time) approach, assuming that the distribution over weights is given by a log-concave pdf, but there are a lot of details here I didn't check.

Here's a high level overview -- below this is explained in terms of polytopes and stuff, but maybe in retrospect all of that was unnecessary.

a) Fix a spanning tree $T$. The set of weights $w$ such that $T$ is a minimum spanning tree a convex cone, $C$. (It's clearly closed under scaling. The convexity is also not hard to check, as $w_1(T) \leq w_1(T'), w_2(T) \leq w_2( T')$ implies the same for $w = \lambda w_1 + (1 - \lambda) w_2$.)

b) You can test membership in $C$ by computing the MST. (You can also get a hyperplane separation oracle, as explained below.) This let's you calculate $1_C(w)$.

c) Say that your pdf on weights $f(w)$ is log concave, and that we can evaluate it efficiently. Then, $f(w) 1_C(w) = g(w)$ is also log-concave, and we can evaluate it.

d) Now we want to integrate the log-concave function $g(w)$. We also know that the support of $g$ is contained in the set $C$. There's a vast literature on this topic, e.g. http://www.cs.yale.edu/homes/kannan/Papers/applegate.pdf , which I do not really understand beyond the 'log-concave = good for algorithms' level. This is something I feel concerned about, especially if we expect the probability of $T$ to be small.

e) The next step, which I don't know the right way to do, is to find the right algorithm from the 'integrating log-concave' functions literature, and exploit what we know about $g(w)$ to improve the numerical accuracy.

f) Some of the associated run-time guarantees for those Markov chains are scary (even if polynomial), and require some guarantees about the geometry of the function. We also have a more serious problem, namely that the set $C$ might be too thin near the typical set of $f$, and getting these Markov chains to converge well could be hard in practice. On the other hand, if we know that $\int g(w) dw$ is large enough, maybe we don't have this problem.

Edit/Remark: There was an error in an initial formulation of this. I think it is patched now.


1. Geometric Formulation

Let's consider the spanning tree polytope, $P$, which is a 0/1 polytope. We can think of the problem as asking : Fix some vertext $[T]$ of $P$. Draw a random linear functional with weights given by your distribution $D$. It's maximized at a unique vertex, $T'$, of $P$. What's the probability that $T' = T$?


2. Modified Geometric Formulation

Let's consider a related question first.

I have a polytope $Z$ containing a neighborhood around the origin. I get a direction according to some distribution $D$, and I get the (a.s.) unique maximizing vertex for that direction? Q: What is the probability that I get a particular vertex?

This can be answered geometrically in two steps:

i. For a particular vertex $v$, the set of directions that are uniquely maximized there is the full dimensional cone $C$ that $v$ corresponds to in the normal fan of $Z$; explicitly, $C$ is generated by the normal vectors to the facets of $Z$ incident to $v$.

ii. So, we want to calculate the $D$ measure of $C$. If we have a separation oracle for $C$, and if $D$ is given by a log-concave pdf, we can do this via the ball-walk Markov chain.

We have such a separation oracle if we have a list of the facets of $Z$, for instance if it is given by $Ax \leq b$ form. This is because if $v_1, \ldots, v_m$ generate a cone $D$, you can test efficiently whether $x \in D$ by solving feasability for the system $\{ \alpha_i \geq 0 , \sum_i \alpha_i v_i = x \}$. This approach doesn't work for the spanning tree polytope, because there can be exponentially many facets incident to a given vertex.

However, it seems like there is a simple construction of a separation oracle for $C$ whenever we can optimize linear functions over $Z$; or at least, in the case of the spanning tree polytope, though I think the argument is pretty general. I'll explain this in the next section.


3. The separation oracle for $C$ (previously faulty, now maybe repaired?)

Given a vector $x$, how do we test that it is in $C$, the normal cone of $Z$ at $v$?

The basic idea is simple: we optimize $x$ over $Z$. If $v \in \text{argmax}_{p \in Z} x \cdot p $, then $x \in C$, otherwise $x \not \in C$.

We need one more thing for a separation oracle, namely we need to get hyperplane separating $x$ from $C$ in the NO case.

(Edit: Actually maybe for the ball walk we don't even need this strong of a separation oracle? Just deciding if a point is in the set is enough.)

Suppose we have a vertex $w$ of $Z$ for which $x \cdot v < x \cdot w$. For instance, take $w \in \text{argmax}_{p \in Z} x \cdot p$.

(In the case of the spanning tree polytope, we can find such a vertex $w$ by starting from the spanning tree $v$, and adding the heaviest (in $x$) unused edge and deleting the lightest edge in the cycle it creates. This even gives us an adjacent vertex, which I thought was important while writing this out, but I guess it doesn't matter.)

Take $H = \{ r : r \cdot (v - w) = 0 \}$.

Claim: $H$ separates $x$ from $C$.

Proof: By construction, $x \cdot v < x \cdot w$, so $0 < x \cdot ( w - v)$. But by definition, $c \in C$ means that $c \cdot v \geq c \cdot w$, or $0 \geq c \cdot (w - v)$. This means that $H = \{ r : r \cdot (v - w) = 0 \}$ separates $x$ from $C$.


Conclusion:

Suppose $T$ is your spanning tree, and $[T]$ is the corresponding point in the spanning tree polytope. Then, using the efficient separation oracle for the normal cone at $[T]$ in $P$, $C$, you can integrate your log-concave distribution over $C$ via the ball-walk, and determine the probability that $T$ was the maximum spanning tree.


Remarks:

  1. A practical issue might be that those normal cones are going to be very thin. If there's some reason to believe that the probability won't be exponentially small, maybe one can apply a transformation that makes the geometry more pleasant, and keep track of the change to the pdf. This might also be a theoretical issue, because the geometry of the polytope matters for the ball-walk Markov chain, although I remember reading that there are geometric transformations/normalizations you can do to make them work better.

  2. Probably instead of talking about the normal cone, one can achieve the same thing by considering the LP dual. This might lead to something cleaner.

  3. I seem to have flipped min spanning tree to max spanning tree. I don't think this impacts anything meaningfully, because you can always multiply the weights by $-1$.