I see an algorithm from the lecture course for constructing components with a linear number of vertices. The algorithm is stated as follows:
$G_0$ be the empty graph on $n$ vertices;
For $i = 1, \ldots, t$:
Add an edge between a uniformly chosen pair of non-adjacent vertices of degree at most two in $G_{i - 1}$ if such a pair exists to construct $G_i$. Or, just let $G_i=G_{i-1}$.
- Return $G_t$.
Then it claims that $G_t$ a.a.s contains a component with a linear number of vertices when $t \geq(1+\varepsilon) n$ as $n \rightarrow \infty$.
I wonder the reason behind this.
I think this question is closely related to The largest component of G(n, p). But the main differences here are two: the first one is we only add edges for those vertices with a degree at most two, and the second one is the probability of choosing an edge is $p \sim 1 / (n (n - 1)) \not\gg 1 / n$. I do not know what to do here... Could anyone help me? Thanks in advance!
Intuitively: when all components are tiny, adding an edge is almost certain to reduce the number of components. But we cannot reduce the number of components for $(1+\varepsilon)n$ steps, so we must eventually see a component big enough to give us a decent chance of adding an edge within a component.
To state and prove this formally, let's call a vertex "open" if it has degree $2$ or less (so that we can still add edges to it). Just by counting degrees, the number of vertices with degree $3$ in $G_i$ for $i\le t$ can be at most $\frac23(1+\varepsilon)n$, so each $G_i$ before that point has at least $(\frac13 - \frac23\varepsilon)n > \frac14 n$ open vertices.
For $i=0,1,2,\dots$, define a random variable $X_i$ to be
In order to see $X_i = 1$, we must first have no component with $\frac18 \varepsilon n$ open vertices. When the random edge we add to go from $G_i$ to $G_{i+1}$ is chosen, look at just one vertex first. There are at least $\frac14n$ possibilities for the other vertex, and at most $\frac18\varepsilon n$ of them are in the same component as the first vertex, so the probability that $X_i = 1$ is at most $(\frac18\varepsilon n)/(\frac14 n) = \frac12 \varepsilon$, no matter what $G_i$ is.
This means that a $\text{Binomial}(t, \frac12\varepsilon)$ random variable is an upper bound for the sum $X_0 + X_1 + \dots + X_{t-1}$. This binomial random variable has an expected value of $\frac12\varepsilon t$, which is close to $\frac12\varepsilon n$; by Chebyshev's inequality, say, with high probability it does not exceed $\varepsilon n$.
But we cannot see $t - \varepsilon n = n$ steps where $G_{i+1}$ has fewer connected components than $G_i$. So the only way that $X_0 + X_1 + \dots + X_{t-1}$ can stay below $\varepsilon n$ is if, at some point, $G_i$ has a component with at least $\frac18\varepsilon n$ open vertices: a component of linear size.
Note: this proof gives away a lot. Intuitively, I expect that this random process should not be less likely to create large components, compared to the Erdős–Rényi model where we add edges with no care for vertex degrees. In that model, we have linear-size components as soon as we see $(\frac12 + \varepsilon)n$ edges: half the number of edges needed for the argument above.
It seems much harder to get that result, though, because it's hard to get the independence needed to see the branching process approximations that get us the phase transition at $\frac12n$ edges for the Erdős–Rényi random graph.