A doubt about almost all graphs

111 Views Asked by At

I am currently trying to understand the paper “On the chromatic index of almost all graphs” by Erdős and Wilson. I have two doubts, I’d be grateful if someone could explain them to me. This is part of the proof the lemma “Almost all graphs have a unique vertex of maximum degree.”

My first doubt is in the following paragraph from the paper:

If $G$ is a random labeled graph with $n$ vertices, then the probability that a given vertex has degree $k$ is $\binom{n-1}{k}2^{1- n}$, since each edge of $G$ appears with probability $1/2$.

If $k = 1/2(n - 1) + t$ (say), then by a standard asymptotic argument for the binomial distribution (see, for example, [4, pp. 179-180]), we have $$\binom{n-1}{k}2^{1-n} = ( 1 + o(1))(1/2\pi n)^{-1/2}e^{-2t^2/n}.$$

It follows from the Inclusion-Exclusion Principle (as used, for example, in [6, pp. 71-72]) that for almost all graphs $G$, the maximum vertex-degree of $G$ is equal to $$1/2(n - 1) + 1/2(n \log n)^{1/2} + o(n \log n)^{1/2}.$$

Here I don’t understand how the Inclusion-Exclusion principle is used, and which events are being considered. I had a look at the referenced book, but the text there is confusing.

My second doubt is in the following paragraph:

But the probability that two given vertices both have degree $k$ is $(1+o(1))\binom{n-1}{k}^22^{2-2n}$, and so it is enough to prove that $$ n^2\sum' \binom{n-1}{k}^2 2^{2-2n}\rightarrow 0 $$ as $n\rightarrow \infty$ where the prime indicates that the summation extends only over those values of $k$ satisfying (3)

Here I don’t understand where the $n^2$ term comes from. Also, I am having difficulty in showing the limit tending to zero.

Thanks a lot for your time.

2

There are 2 best solutions below

3
On

Some hints on solving the problem:

  1. The probability is falling exponentially in $t$. So the probability for $k+t$ is $e^{-2(k+t)^2/n}≤ e^{-2t^2/n}e^{-4tk/n}$. If you now sum over $k≥0$, the last term is of order $$ \sum_{k≥0} e^{-4tk/n} ≤ e^{-4t\cdot 0/n} + \int_0^\infty e^{-4tk/n} \mathrm dk = 1 + \frac{n}{4t} = 1 + o(1). $$ So summing does nothing (at least for big enough $t$) and the probability to have degree $t$ is of the same order as the probability to have at least degree $t$.

  2. If you input $t = \frac12\sqrt{n\log n}$ respective $t=\frac12\sqrt{(1-\epsilon)n\log n}$ into the formula, you get $$ \frac1{\sqrt{2\pi n}}e^{-2t^2/n} = \frac1{\sqrt{2\pi n}} e^{-(1-\epsilon)n\log n/2n} = \frac 1{\sqrt{2\pi n}}n^{-\frac{1-\epsilon}2} = O(n^{-1+\epsilon/2}) $$

  3. The reason for the $n^2$ is that the term in the sum is "probability for two given vertices" and you need the "probability for any two vertices", which is why you must multiply by $\binom n2 = O(n^2)$.

  4. An upper bound for the maximum degree can be proven with 2. The expected number of vertices with degree higher than $\frac{n-1}2 + \frac12\sqrt{(1+\epsilon)n\log n}$ is $nO(n^{-1-\epsilon/2}) = O(n^{-\epsilon/2})$. So there are such nodes only with probability o(1).

  5. A lower bound can be similarly shown using the second moment method: The second moment of the amount of nodes with degree higher than $\frac{n-1}2 + \frac12\sqrt{(1-\epsilon)n\log n}$ is of order $nn^{-1+\epsilon/2} + n(n-1)n^{-2+\epsilon} \sim n^{\epsilon}$, the same order as the square of the first moment, this moment is of order $nn^{-1+\epsilon/2}$. So the probability that there are nodes with this degree goes to 1. (The $\sqrt{2\pi}$ cancel out.)

  6. I could not show that the sum converges: The terms I get are of order $$ n^2\sum'\binom{n−1}k2^{2−2} \sim n^2 (n^{-1+\epsilon/2})^2 = n^{\epsilon} \to \infty. $$

2
On

I believe that what Erdos-Wilson refer to as an "Inclusion-Exclusion" type of argument is the idea that to estimate the size of a union of sets $$ |S_1\cup\cdots\cup S_n| $$ we can start with the upper bound $\sum |S_i|$, then consider the lower bound $\sum|S_i|-\sum |S_i\cap S_j|$, thereby sandwiching the true cardinality somewhere in between. The point is that in the case at hand, some calculations show that the upper and lower bounds have matching asymptotic exponents in the $n\to\infty$ limit. In probabilistic language, this is referred to as the union bound: $$ \mathbb P(A_1\cup \cdots A_n)\leq \sum_{i=1}^n\mathbb P(A_i) $$ and I will refer to the situation where the upper and lower bounds match in this sense by saying that the union bound is "asymptotically tight in the exponent" (see details below). The lower bound is a special case of the Bonferroni inequalities.

Regarding your first doubt, first let me point out that fortunately a version of the reference cited as "[6, pp . 71-72]" is provided to the public at Project Gutenberg although the page numbers have changed in the process of digital typesetting and now correspond to pages 92-95 rather than 71-72.

Let $G_n$ be what Erdos-Wilson refer to as a "random labeled graph with $n$ vertices" a.k.a. $G(n,p)$ where $p=\tfrac12$. In contrast, Moon considers a uniformly random tournament $T_n$, which is equivalent to taking $K_n$ (the complete undirected graph with vertex set $\{1,\ldots,n\}$) and orienting each undirected edge uniformly at random to obtain $T_n$.

Note that the degree $d_i$ of vertex $i$ in $G_n$ and the out-degree of vertex $i$ in $T_n$ (denoted $s_i$ by Moon and referred to as the "score") have the same distribution - $\textrm{Binomial}(n-1,\tfrac12)$. This establishes a basic link between the models $G_n$ and $T_n$. However, don't make the same mistake I did in a previous revision of this answer and try to extend this to an equality in distribution between the degree vector $\vec{d}:=(d_1,\ldots,d_n)$ and score vector $\vec{s}:=(s_1,\ldots,s_n)$ - since the sum $$ \sum_{i=1}^n s_i=\binom{n}{2} $$ is non-random whereas $\sum_i d_i$ equals twice the number of edges in $G_n$, which is twice a $\textrm{Binomial}(\binom{n}{2},\tfrac12)$ random variable. (I've hidden the original incorrect claim I made behind a spoiler below.)

We use a coupling (a.k.a. joint probability distribution on $G_n$ and $T_n$) in which, for $1\leq i<j\leq n$, the undirected edge $\{i,j\}$ is present in $G_n$ if and only if the directed edge from $i$ to $j$ is present in $T_n$. Then the out-degree of vertex $i$ in $T_n$ (denoted $s_i$ by Moon and referred to as the "score") coincides with the degree $d_i$ of vertex $i$ in $G_n$.

Moon then shifts and scales $s_i$ to obtain a "reduced score" $$ w_i=\frac{s_i-(n-1)/2}{\sqrt{(n-1)/4}},\qquad s_i=\frac{n-1}{2}+w_i\sqrt{\frac{n-1}{4}} $$ with mean $0$ and variance $1$. Since $d_i$ and $s_i$ have the same distribution, when we apply the same transformation to $d_i$ (calling the result $\widetilde w_i$) it too will have mean $0$ and variance $1$. Thus if we let $d=\max_i d_i$ denote the maximum degree of $G_n$, we have that $$ d=\frac{n-1}{2}+\widetilde w\sqrt{\frac{n-1}{4}},\qquad \widetilde w=\max_{1\leq i\leq n}\widetilde w_i. $$

To summarize Moon's argument, by the classical CLT for Bernoulli variables $w_i$ approaches a standard Gaussian as $n\to\infty$. Thus, in order for the union bound $$ \mathbb P(w>x)\leq \sum_{i=1}^n \mathbb P(w_i>x)=n\mathbb P(w_1>x) $$ to be "asymptotically tight" in the exponent, by inspection of the tail of the Gaussian we would need $$ e^{-w^2/2}=n^{-1+o(1)},\qquad w=(1+o(1))\sqrt{2\log n}, $$ and this "asymptotic tightness in the exponent" of the union bound is what Moon proceeds to show with his $\mathbb P(w<x_n^-)$ lower bound argument (I'll say more about this below). The corresponding claim with $\widetilde w$ in place of $w$ is $$ \widetilde w = (1+o(1))\sqrt{2\log n},\qquad d=\frac{n-1}{2}+(1+o(1))\sqrt{\frac{n\log n}{2}}. $$ This is supposed to match the display between equations (1) and (2) in Erdos-Wilson but they claim instead that $$ d=\frac{n-1}{2}+(1+o(1))\frac{\sqrt{n\log n}}{2}, $$ with the factor of $2$ moved outside the square root, and this appears to be an innocuous typo in the Erdos-Wilson paper since the value of this constant is immaterial to the remainder of the argument in their paper.

Regarding Moon's proof of the lower bound, I agree that the notation is difficult to follow at first. All that is being done is to show that the out-degrees of distinct vertices in the tournament $T_n$ are negatively correlated, in the sense that when we reveal information about some vertices having small out-degrees, in the conditional distribution of the out-degrees of the remaining vertices given this information it becomes less likely for them to also have small out-degrees than if no information had been revealed. This is intuitively clear because the tournament represents a zero-sum game: the more one team loses, the more wins there are for the remaining teams.

In Moon's notation, the upper bound states that $$ \mathbb P(w>x_n^+)=O(n^{-2\epsilon})=o(1) $$ and the lower bound states that $$ \mathbb P(w<x_n^-)=(1-O(n^{-1+2\epsilon}))^n=e^{-O(n^{2\epsilon})}=o(1), $$ and thus a final union bound establishes that $$ \mathbb P(x_n^-\leq w \leq x_n^+)=1-o(1). $$

I hope this sheds some light on Moon's argument and helps address that aspect of your first doubt, but since we don't have an exact equality in distribution between $w$ and $\widetilde w$, actually proving the lower bound for $\widetilde w$ requires making a new argument (not identical to the one made by Moon) although the proof of the upper bound goes through unchanged (because it does not rely on any knowledge of the joint distribution).


Regarding your second doubt, the first aspect of your second doubt (where does $n^2$ come from) is again just the union bound: $$ \mathbb P(\exists i,j\textrm{ such that }d_i=d_j=k)\leq \sum_{i=1}^n\sum_{j=1}^n\mathbb P(d_i=d_j=k)\leq n^2\mathbb P(d_1=d_2=k). $$ The remaining aspect of your second doubt (how to prove the sum tends to zero) requires more work to flesh out, but the conceptual idea is that the cutoff in the summation is already chosen so that it just barely converges when the summands are without the square, so introducing the squares takes the sum from just barely converging, to coverging to a quantity which is $o(1)$ as $n\to\infty$.