Induced path of length $\sqrt{\log n}$ in a sparse random graph

597 Views Asked by At

I'm trying to solve Exercise 1.4.2 in this book. Here $G_{n,p}$ is a random graph on $n$ vertices where each edge is present independently with probability $p$.

Suppose that $p=d/n$ where $d>1$. Show that w.h.p. $G_{n,p}$ contains an induced path of length $\sqrt{\log n}$.

The only technique for such problems introduced in the book at the point is the second moment method, so I tried that. Recall this method uses the inequality $$P(X=0)\le \frac{\operatorname{Var X}}{(\mathbb E X)^2}.$$

So, if $X$ counts the number of induced paths of length $\sqrt{\log n}$, and we show that the square of its expectation dominates its variance, we're set.

Write $X = \sum_{i\in K} I_i$, where $K$ is the set of possible paths and $I_i$ is an indicator function for the presence of the induced path.

We compute $$|K| = \frac{1}{2} \frac{n!}{(n - \sqrt{\log n})!}$$

and the probability of each $I_i$ occurring is $$\mathbb E I_i =p^{\sqrt{\log n}}(1-p)^{\binom{\sqrt{\log n}}{2} - \sqrt{\log n}}.$$

The $(1-p)$ factor tends to a constant, and we get asymptotically that the first moment of $X$ is $\mathbb E X= d^{\sqrt{\log n}}$, which explains why the problem requires $d>1$.

However, it seems that the variance won't be small enough to give the result. Since all the $I_i$ are positively correlated, we can get a lower bound by asking what happens in the case where all the $I_i$ are independent. Recall the variance of a binomial variable (such as $I_i$) with expectation $a$ is $a(1-a),$ and here $a$ is $o(1)$ so we get that the variance is approximately $a|K|$. Then

$$ \frac{\operatorname{Var X}}{(\mathbb E X)^2} \approx \frac{ |K| a (1-a)}{|K|^2 a^2} \approx \frac{1}{K|a|} = \frac{1}{\mathbb E X}.$$

So ignoring all correlations, we would just barely get the result, since $\mathbb EX$ grows slowly. However, since it grows so slowly, it seems trying to account for all the correlations will inevitably spoil the proof by making the variance too large.

Am I on the right track here? Or is another method needed? I'm also curious why $\sqrt{\log n}$ works but something larger like $\log{n}$ doesn't. This restriction doesn't come out of the above calculations.

1

There are 1 best solutions below

7
On BEST ANSWER

I'll begin by assuming we consider paths on $t$ vertices until we see the difference $t = \sqrt{\log n}$ makes. In general, I think it's good habit to do such a calculation for paths of arbitrary length and only then figure out the value of $t$ that will go.

Your calculation of $\mathbb E I_i$ is not quite right: you should have a factor of $p^{t-1}$ rather than $p^t$, since a $t$-vertex path has $t-1$ edges, which should mean $$\mathbb E X \sim \frac{n}{2} \cdot d^{t-1}.$$ This should be valid for $t \ll \sqrt n$, but also explains the reason for taking $t = \sqrt{\log n}$ or similar. For this value of $t$ (in fact, for any $t = o(\log n)$, but $\sqrt{\log n}$ is everyone's favorite function in this range) the linear factor of $\mathbb EX$ dominates the $d^{t-1}$ factor.

Let's see the difference this makes. When calculating $\mathbb E(X^2)$, we are counting ordered pairs of induced paths. These come in three types:

  1. The disjoint (and therefore uncorrelated) pairs. These are asymptotically almost all the pairs; their contribution is $(1+o(1)) (\mathbb EX)^2$, and will cancel when we compute $\operatorname{Var} X$.
  2. The incompatible pairs. Recall that we're counting induced paths: if one path is $(v_1, v_2, \dots, v_t)$, and the other contains an edge such as $(v_1, v_3)$, both cannot be induced paths at the same time. As a result, these contribute nothing to $\mathbb E(X^2)$.
  3. The overlapping pairs. If two paths of length $t$ overlap on $s$ vertices, they can either form a longer path on $2t-s$ vertices, or a tree with a single vertex of degree $3$ at which paths with $s-1$, $t-s+1$, $t-s+1$ edges meet, (edit) or a few other cases, but not too many.

You should probably do the calculation carefully for practice, but the underlying reason why things work out is this: the third type of pair corresponds to a very specific kind of tree component on $O(t)$ vertices. We can count these in more or less the same way as paths to get that there's $n \cdot d^{O(t)}$ of these, in expectation. When $t = o(\log n)$, the $n$ term is still dominant, and we get $\operatorname{Var} X = o(n^2)$. In fact, $\operatorname{Var} X = o(n^{1+\epsilon})$ for any $\epsilon>0$.