no giant component below 1/n

214 Views Asked by At

I'm working on a talk on the giant component in the Erdős–Rényi model of random graphs. I use Foundations of Data Science (Avrim Blum, John Hopcroft, and Ravindran Kannan). At the theorem or rather the proof, that we don't have large connected components if the edge probability is less than $\frac{1}{n}$, I'm stuck. The section can be found on page 253 at https://www.cs.cornell.edu/jeh/book.pdf Thm. 8.9

Theorem. Let $p=d/n$ with $d < 1$. The probability that $G(n; p)$ has a component of size $k >c \frac{\ln\ n}{(1-d)^2}$ is at most $\frac{1}{n}$ for a suitable constant $c$ depending on $d$ but not on $n$.

In the proof $z_k$ ist the random variable, that counts the number of discovered vertices [EDIT: after k steps] in a Breadth First Search starting at $v$. Its mean is bounded above by $d \cdot k$ (no problem so far).

In the proof a Chernoff bound is used to estimate $$P(v\in V(G) \text{ is in a connected component of size at least } \textbf{k})\leq P(z_k\geq k)\leq \frac{1}{e^{c_0k}},$$ for a $c_0>0$.

The first inequality is clear to me. If I understand the second inequality correctly, we get it from $$P(z_k\geq k)=\text{(monotony of exp)} = P(e^{c_0 z_k}\geq e^{c_0 k}) \leq \text{(Markov inequality)} \leq \frac{E[e^{c_0z_k}]}{e^{c_0k}}\leq \text{(we can choose $c_0$ s.t. numerator is $1$)} \leq \frac{1}{e^{c_0k}}$$

I think that $c_0$ doesn't depend on $n$ because $z_k<kd$ doesn't. Now the authors set $k\leq c\log n $ for a "suitably large" $c$, which gives $$\frac{1}{e^{c_0k}} \leq \frac{1}{e^{c_0c\ \log\ n }}=n^{-c_0c}\leq \text{(and here is my problem)} \leq n^{-2}.$$

Now for my questions:

  1. $c_0$ is determined by the need of a mean of $1$ in the Chernoff bound and is dependent only on $d$ and $k$ and not on $n$. Correct?

  2. Isn't $c$ dependent on $n$? I don't really know what "suitably large $c$" means. Suitable for what? I'm not really sure what it is I am actually choosing between $k,c$ and $n$, and why $c$ doesn't depend on $n$.

  3. For the conclusion we then use the union bound to get $P(\text{all vxs. in small cpts.})\leq n\cdot \frac{1}{n^2}=\frac{1}{n}$. This is fine by me, but then the proof didn't use components of size $c \frac{\ln\ n}{(1-d)^2}$. Where does this $\frac{1}{(1-d)^2}$ come from?

P.S.: After reading the "homework question" section: my professor actually encouraged me to ask here :-)

1

There are 1 best solutions below

0
On BEST ANSWER

It turns out to be correct that $c_0$ doesn't depend on $n$ because $\mathbb E[z_k]$ doesn't. But this is not obviously true, because $z_k$ itself absolutely does depend on $n$. To see this and the answers to your other questions, we need a more explicit form of the Chernoff bound, without the $c_0$.

To get there the way you're going, we'd write $z_k$ (actually, our $\text{Binomial}(n, \frac{dk}{n})$ upper bound on $z_k$) as a sum of independent indicator variables, split up $\mathbb E[e^{c_0 z_k}]$ as a product of expectations over those indicator variables, and see what we get. But you shouldn't want to prove Chernoff bounds from scratch every time you use them. What everyone actually does is look up the bound on Wikipedia. For example, if $\frac12 < d < 1$, then $dk < k < 2dk$ and so the bound $$ \Pr[\text{Binomial}(n, \tfrac{dk}{n}) > k] < \exp\left(-\frac{(1-d)^2}{3d}k\right) $$ applies, exposing the dependence on $d$, which we'll need later. (We can use a slightly weaker inequality for the $d \le \frac12$ case, but we really only care about $d$ very close to $1$, so I won't mention it here.)

Now take $k = \frac{3d}{(1-d)^2} \cdot 2 \ln n$. (This is the "suitably large $c$" in question.) Then we have $$ \Pr[z_k > k] < \exp\left(-\frac{(1-d)^2}{3d} \cdot \frac{3d}{(1-d)^2} \cdot 2 \ln n\right) = e^{-2\ln n} = \frac1{n^2}. $$ Finally, the answer to your last question of "where does the $\frac1{(1-d)^2}$ come from?" is that this book is just inconsistent about what it hides under the "$c$". From the above discussion, it's clear that we show that the largest connected components have size at most $$6d \cdot \frac{\ln n}{(1-d)^2}$$ and if we wanted to, we could say that this is at most $$\frac{6\ln n}{(1-d)^2}$$ since $d<1$. The denominator of $(1-d)^2$ is important, since it goes to $0$ when $d$ is close to $1$ and makes this bound blow up. The $d$ in the numerator, not so much.

Also, I suggest Chapter 11 of Alon and Spencer's Probabilistic Method as a secondary reference; that book is very careful about details you really don't want to miss if you study the phase transition any further.