Estimating size of minimum vertex cover from randomly sampled subgraph

270 Views Asked by At

I would like to estimate the size of the minimum vertex cover (MVC) for a very large graph $G=(V,E)$. To be exact, I would like to estimate the following proportion:

$$p(G)=\frac{|MVC|_G}{|V|}$$

As computing a MVC is quite expensive, I want to sample a subgraph $G_s=(V',E')$ with $n$ vertices from $G$ and compute $p(G')$. Finally, my goal is to estimate $p(G)$ from $p(G')$ with some error and/or confidence depending on $n$.

After a lot of research, I wasn't able to get a convincing result and almost no literature. I linked this problem to the estimation of the independence number but, once again, its estimation from a randomly sampled subgraph is absent.

I'm open on any parameter of the problem and if you have even a small idea of a potentially interesting paper to read, I am all ears. Thank you.

2

There are 2 best solutions below

0
On BEST ANSWER

Following the very good explanation from Misha, I am just going to give my feedbacks after finding sufficient results for my application. Multiple papers have been published on the subject proposing to estimate the the MVC size from only a fraction of the nodes (they call it sublinear as it is sublinear regarding the number of nodes). In chronological order, here are the one you want to take a look at:

  • Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms, Parnas and Ron (2007)
  • Constant-Time Approximation Algorithms via Local Improvements, Nguyen and Onak (2008)
  • An improved constant-time approximation algorithm for maximum~matchings, Yoshida and Yamamoto (2009)
  • A Near-Optimal Sublinear-Time Algorithm for Approximating the Minimum Vertex Cover Size, Onak and Ron (2011)

They have been one the road of improving successively both the complexity of the algorithm and the quality of the approximation. Enjoy!

3
On

If there exists a vertex cover $U$ of order $|U| = p(G) |V|$, then any subgraph of size $n$ is going to see $p(G) n$ of $U$'s vertices in expectation, and this is tightly concentrated, central limit theorem and all. This means $p(G_s) \lessapprox p(G)$; we can work out what the bounds look like, but they're not the big problem here.

The big problem is that $G_s$ could have a really good vertex cover that does not come from $U$. To give one bad example, suppose $G$ is the cycle $C_N$, with $p(G) = \frac12$. Then a random sample of $n$ vertices contains a fixed edge with probability $\binom{N-2}{n-2}/\binom Nn = \frac{n(n-1)}{N(N-1)}$, so the expected number of edges is $\frac{n(n-1)}{N-1}$.

For $n \ll \sqrt{N}$, we'll see none of the edges of $G$, and get $p(G_s) = 0$, with high probability. By a similar calculation, for $n \ll N^{2/3}$, we'll just see a bunch of isolated edges and isolated vertices; in this range, the expected size of a vertex cover of $G_s$ is just the expected number of edges, and the expected value of $p(G_s)$ is about $\frac{n-1}{N-1}$, which is tiny. We see similar behavior for all $n \ll N$.

Even for $n = \frac12 N$, which is ridiculous as a sampling strategy, most components of $G_s$ are paths of constant length, and for every path of even length or isolated vertex, we reduce the size of the minimum vertex cover. So $p(G_s)$ is off by a constant from $p(G)$; I compute that more precisely, the expected value of $p(G_s)$ is about $\frac13$, but the actual computation is annoying and doesn't matter.