Prove choosing $\lceil\frac{V}{2}\rceil$ vertices accounts for at least $\frac{3}{4}$ of edges

177 Views Asked by At

Give a polynomial-time algorithm that finds $\lceil\frac{V}{2}\rceil$ vertices that collectively account for at least $\frac{3}{4}$ of the edges in an arbitrary undirected graph.

The algorithm I have come up with is a greedy algorithm that iterates through the graph choosing the node with the largest number of incident edges and then updating the adjacent vertices with their new incident edge count.

I am fairly confident this algorithm will work, however I am struggling with the proof that I will always account $\frac{3}{4}$ of the edges.

What I have so far is that because the sum of the degrees of all the edges in the graph is equal to $2E$ then on average every node will contribute at least a degree of $\frac{2E}{n}$ to that summation. $\frac{2E}{n}\times\frac{n}{2}$ vertices provides me with just $E$. Is this correct or am I missing something?

2

There are 2 best solutions below

3
On BEST ANSWER

Your reasoning is incorrect because when you select a vertex you remove it and its incident edges in which case the total number of edges decreases. If you stick to the original edges then you would be double-counting some of the edges since two vertices may cover the same edge, so you would get a lower bound of $\frac{1}{2}E$, which isn't good enough.

But if you keep track of the number of edges $E$, at step 1 it decreases by a factor of at least $\frac{2}{n}$ because the highest degree is at least $\frac{2E}{n}$, and at step 2 it decreases by a factor of $\frac{2}{n-1}$ for exactly the same reason. So $E$ is multiplied by at most $\frac{n-2}{n}$ and then $\frac{n-3}{n-1}$ and then ... After $\frac{1}{2}n$ steps (for even $n$; you can check that it's looser for odd $n$) $E$ would have been multiplied by at most $\frac{ (\frac{1}{2}n) (\frac{1}{2}n-1) } { n (n-1) }$, which would give the bound you want, and $\frac{3}{4}n$ is an asymptotically tight cutoff ratio because it is asymptotically achieved by the complete graph.

0
On

There is an amazingly simple randomized algorithm for this: sample $|E|\cdot k$ random subsets of size $|V|/2$ and return the one with highest number of incident edges. It works in $O\big(|E|^2\cdot k\big)$ time and returns a correct answer with probability $1-1/e^k$.

Unfortunately the proof is not so simple...

The proof:

For simplicity, let's assume that $n = |V|$ is even. We can do this, because we can always add an isolated vertex, which could be then discarded after obtaining a solution.

First, let's prove that is possible. Take a random subset $S \subset V$ of $n/2$ vertices, what is the expected number of edges incident to $S$? Observe that for any given edge three things could happen: both endpoints are in $S$, one endpoint is in $S$ or no endpoint is in $S$. Define accordingly:

\begin{align} \newcommand{inc}{\mathrm{inct}} \inc_0(S) &= \big\{e \in E\ \big|\ e \text{ has no endpoint in }S\big\},\\ \inc_1(S) &= \big\{e \in E\ \big|\ e \text{ has exactly one endpoint in }S\big\},\\ \inc_2(S) &= \big\{e \in E\ \big|\ e \text{ has exactly two endpoints in }S\big\}. \end{align}

These sets are disjoint, so summing the last two we get (in the derivation the set $S$ isn't quantified, because the numbers are the same for all)

\begin{align} \mathbb{E}X &= \mathbb{E}\Big(|\inc_1(S)| + |\inc_2(S)|\Big) \\ &= \sum_{e \in E} \mathbb{P}(e \in \inc_1(S)) + \mathbb{P}(e \in \inc_2(S)) \\ &= \sum_{(v_1,v_2) \in E} \mathbb{P}(v_1 \in S \land v_2 \notin S) + \mathbb{P}(v_1 \notin S \land v_2 \in S) + \mathbb{P}(v_1 \in S \land v_2 \in S) \\ &\stackrel{\!\!(\spadesuit)}{=} \sum_{(v_1,v_2) \in E} \mathbb{P}(v_1 \in S)\cdot\mathbb{P}(v_2 \notin S) + \mathbb{P}(v_1 \notin S)\cdot\mathbb{P}(v_2 \in S) + \mathbb{P}(v_1 \in S)\cdot\mathbb{P}(v_2 \in S) \\ &= \sum_{(v_1,v_2) \in E} \frac{1}{2}\cdot\frac{1}{2}+\frac{1}{2}\cdot\frac{1}{2}+\frac{1}{2}\cdot\frac{1}{2} = \frac{3}{4}|E| \end{align}

The expected number of incident edges is $\frac{3}{4}|E|$, so there has to be some subset of $V$ which realizes at least this number (i.e. maximum couldn't be smaller than the average).

One important point in this derivation is the equality marked by $(\spadesuit)$ where we apply, among others, $$\mathbb{P}(v_1 \in S \land v_2 \notin S) = \mathbb{P}(v_1 \in S)\cdot\mathbb{P}(v_2 \notin S),$$ which we can use only if the choice of $v_1$ and $v_2$ in $S$ were independent.

Now this raises a concern, because if we were to pick whether $v \in S$ or not independently with $\mathbb{P}(v \in S) = \frac{1}{2}$, then there is a slight chance that these will all "fire" at the same time and the resulting set wouldn't be a valid $S$, while at the same time it might be the one which bounces up the expected value so high.

Fortunately, we need only pairwise independence, and that allows us to pick a distribution which ensures that $S$ as $n/2$ vertices. In other words, it doesn't matter how we choose elements of $S$, the only thing we need is $P(v \in S) = \frac{1}{2}$ and pairwise independence.

Also, we know that $X \leq |E|$, so $Y = |E|-X$ is a non-negative random variable and we can use the Markov inequality. Moreover, $|X|$ is discrete, so

\begin{align} \mathbb{P}(X < \tfrac{3}{4}|E|) &= \mathbb{P}\Big(X \leq \tfrac{3}{4}|E|-\tfrac{1}{4}\Big) \\ &= \mathbb{P}\Big(Y \geq |E|-\tfrac{3}{4}|E|+\tfrac{1}{4}\Big) \\ &= \mathbb{P}\Big(Y \geq \tfrac{1}{4}|E|+\tfrac{1}{4}\Big) \\ &\leq \frac{\mathbb{E}Y}{\tfrac{1}{4}|E|+\tfrac{1}{4}} = \frac{\tfrac{1}{4}|E|}{\tfrac{1}{4}|E|+\tfrac{1}{4}} \\ &= \frac{|E|}{|E|+1} \end{align}

that is,

$$\mathbb{P}(X \geq \tfrac{3}{4}|E|) \geq \frac{1}{|E|+1}$$

and

\begin{align} \mathbb{P}(\text{we will find $S$ in $(|E|+1)k$ searches}) &\geq 1-\left(\frac{|E|}{|E|+1}\right)^{(|E|+1)k} \\ &= 1-\left(1+\frac{-1}{|E|+1}\right)^{(|E|+1)k} \\ &\geq 1-\frac{1}{e^k} \end{align}

Conclusion:

To find an appropriate set $S$ with probability $(1-\varepsilon)$ sample $|E| \log \frac{1}{\varepsilon}$ random subsets and pick the one with the highest number of incident edges.

I hope this helps $\ddot\smile$