Probability / Graph Theory / Optimization Problem

115 Views Asked by At

I have a problem involving a random graph $G$ with a set of nodes $n_1, n_2, ..., n_k$. A priori, we know approximately the probabilities $p_{ij} ( =p_{ji} )$ that an edge $e_{ij}$ exists between nodes $n_i$ and $n_j$. The probabilities are centred around $1/2$, with far more edges being 'coin tosses' than near certainty / impossibility.

I say 'approximately' because there's another constraint - we know that the graph $G$ is a disjoint union of cliques $K_1, K_2, K_3 ...$ so if we know (for example) that $e_{12}$ is an edge of $G$, and $e_{13}$ is an edge of $G$, we get 'for free' that $e_{23}$ is an edge of $G$. Similarly if $e_{45}$ is an edge of $G$ and $e_{56}$ is not an edge of $G$, we get 'for free' that $e_{46}$ is not an edge of $G$.

Let's suppose that we can ask an oracle questions of the form 'is $e_{ij}$ an edge?' and get a yes/no answer. Let's further suppose the oracle charges $\$N$ per question, so we'd like to ask as few questions as possible. What type of algorithm might deliver a (nearly) optimal solution in terms of minimising the expected cost of working out the entire structure of $G$?

My 'naive' approach is to ask about the edge most likely to be in $G$, and assuming that the answer is 'yes' (if not, ask about the next most likely edge until you get a hit), then ask about the edge most likely to be connected to one end of that edge, and continue to try to build up as big a clique as possible - this is good because a big clique means getting lots of edge information 'for free' from one question.

But that means (initially) spending money to gain relatively little information as the entropy of the edge probability $Ent = p(1-p)$ is low because $p$ is close to $1$. It would in some sense be more informative to find out about edges that have probability close to $1/2$, but there are far many more of these.

Any thoughts? Is this a known (type of problem), and are there heuristics about finding a near-optimal strategy if so?

1

There are 1 best solutions below

5
On

Edit: There was a mistake, this only gives a lower bound for the number of queries

A lower bound for the number of queries is $n-c+ \frac{c(c-1)}{2}$ edges where $c$ is the number of connected components.

We can see that it is not possible to infer the total structure with less edges: The minimum number of edges in the graph given the edges queried is found by the transitive closure of the queried edges. So for every connected component, we need to query at least a spanning tree, otherwise, the transitive closure won't give us the full component. A spanning forest requires $n-c$ edges.

For every pair of connected components, we need to query at least one pair of vertices with one vertex per component (otherwise, it might be possible that these components can be connected). So we require $\frac{c(c-1)}{2}$ more edges. These additional queries are sufficient to guarantee that we can't have more edges than the transitive closure.