Elementary proof for average number of tree components in a random forest of fixed size

125 Views Asked by At

In Flajolet's & Sedgewick's "Analytic Combinatorics" I found the statement that for a forest ("Catalan", i.e. collection of ordered trees) of size $n$, uniformly distributed, the number of tree components $X_n$ satisfies $$\lim_{N\rightarrow\infty}\mathbb P(X_n=k)=\frac{k}{2^{k+1}},$$ but the proof seems to use a lot of heavy(?) machinery introduced in the previous chapters to derive that. Is there also an elementary proof that might provide a little bit more insight?

1

There are 1 best solutions below

3
On BEST ANSWER

Let me start by pointing out that a problem that requires heavy machinery from the canonical text is unlikely to have an elementary proof. Here is what we can do. The species of Catalan trees has specification

$$\mathcal{H} = \mathcal{Z} \times \mathfrak{S}(\mathcal{H}).$$

This yields the functional equation $$z = H(z) (1-H(z)).$$

As we follow the text we see that the forests in question are in fact ordered as opposed to being sets, another possible interpretation. Therefore we will adopt this in our work. (One might think forests are sets of trees rather than sequences but this is not asked for here.) We get for the species of forests

$$\mathcal{F} = \mathfrak{S}(\mathcal{U}\mathcal{H}).$$

with OGF

$$F(z, u) = \frac{1}{1 - u H(z)}.$$

We need to count these for the probabilities. We obtain

$$F(z, 1) = \frac{1}{1 - H(z)}.$$

Extracting coefficients we find

$$[z^n] F(z, 1) = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} \frac{1}{1-H(z)} \; dz.$$

From the functional equation we put $z = w(1-w)$ so that $dz = 1-2w \; dw$ to get

$$\frac{1}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n+1} (1-w)^{n+1}} \frac{1}{1-w} (1-2w)\; dw.$$

This yields two pieces

$$\frac{1}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n+1} (1-w)^{n+1}} \; dw = {2n\choose n}$$

and

$$-\frac{1}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n} (1-w)^{n+2}} \; dw = -{2n\choose n+1}$$

for a result of

$$\left(1-\frac{n}{n+1}\right) {2n\choose n} = \frac{1}{n+1} {2n\choose n}.$$

These are the regular Catalan numbers and what we have here is folklore. Continuing we have

$$[u^k] F(z, u) = H(z)^k.$$

Extracting coefficients yields the integral

$$[z^n] H(z)^k = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} H(z)^k \; dz.$$

The functional equation applies as before and we obtain

$$\frac{1}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n+1} (1-w)^{n+1}} w^k (1-2w)\; dw.$$

We once more have two pieces, getting

$$\frac{1}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n-k+1} (1-w)^{n}} \; dw = {2n-1-k\choose n-1}$$

and

$$-\frac{1}{2\pi i} \int_{|w|=\gamma} \frac{1}{w^{n-k} (1-w)^{n+1}} \; dw = - {2n-1-k\choose n}$$

for a result of

$$\left(\frac{n}{n-k} - 1\right) {2n-1-k\choose n} = \frac{k}{n-k} {2n-1-k\choose n}.$$

Switching to probabilities we have the closed form

$$\bbox[5px,border:2px solid #00A000] {\frac{k}{n-k} {2n-1-k\choose n} \times (n+1) {2n\choose n}^{-1}.}$$

Using the asymptotics for the central binomial coefficient we get

$$\frac{k}{n-k} {2n-1-k\choose n} \times (n+1) \frac{\sqrt{\pi n}}{4^n}.$$

Continuing as documented in part eight, page $25$ (saddle point asymptotics) of the slides for the book we now introduce $G(z)$ and $f(z)$ matching the notation used there with $G(z) = (1+z)^{2n-1-k}$ and

$${2n-1-k\choose n} = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} (1+z)^{2n-1-k} \; dz$$

and with $f(z) = \log G(z) - (n+1) \log z = (2n-1-k) \log(1+z) - (n+1) \log z$ (we apply the method to $\exp f(z)$). Solving the saddle point equation

$$f'(z) = 0 = (2n-1-k)\frac{1}{1+z} - (n+1)\frac{1}{z}$$

we obtain for the saddle point

$$\zeta = \frac{n+1}{n-k-2} \sim 1.$$

The saddle point approximation of the binomial coefficient is thus

$${2n-1-k\choose n} \sim \frac{G(1)}{1^{n+1} \sqrt{2\pi f''(1)}}.$$

With $f''(z) = -(2n-1-k)\frac{1}{(1+z)^2} + (n+1)\frac{1}{z^2}$ we obtain

$$\frac{2^{2n-1-k}}{\sqrt{2\pi (n+1-(n/2-1/4-k/4))}} \\ = \frac{2^{2n-1-k}}{\sqrt{2\pi (n/2+5/4+k/4)}} = \frac{2^{2n-1-k}}{\sqrt{\pi (n+5/2+k/2)}}.$$

With $k$ fixed this is asymptotic to

$$\frac{2^{2n-1-k}}{\sqrt{\pi n}}.$$

Substitute into the probability to get

$$\frac{k}{n-k} \frac{2^{2n-1-k}}{\sqrt{\pi n}} (n+1) \frac{\sqrt{\pi n}}{4^n} = k \frac{n+1}{n-k} \frac{1}{2^{k+1}} \sim \frac{k}{2^{k+1}}$$

as claimed. For it to be genuinely elementary we would need an elementary proof of the binomial coefficient asymptotics, which however is at least as difficult as Stirling's approximation.

Addendum. Observe that we have introduced a singularity at $n=k$ into the closed form probabilities during simplification, which did not affect the result, as $k$ is fixed while $n$ goes to infinity. An alternate version is

$$\left(1-\frac{n-k}{n}\right) {2n-1-k\choose n-1} = \frac{k}{n} {2n-1-k\choose n-1}.$$

We get for the probability

$$\bbox[5px,border:2px solid #00A000] {\frac{k}{n} {2n-1-k\choose n-1} \times (n+1) {2n\choose n}^{-1}.}$$

We can verify that these probabilities sum to one by computing

$$\sum_{k=0}^n {2n-1-k\choose n-1} - \sum_{k=0}^n {2n-1-k\choose n}.$$

Using the almost the same integral as during the saddle point method we get for the first sum

$$\frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n}} (1+z)^{2n-1} \sum_{k=0}^n \frac{1}{(1+z)^k} \; dz \\ = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n}} (1+z)^{2n-1} \frac{1-1/(1+z)^{n+1}}{1-1/(1+z)} \; dz \\ = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} (1+z)^{2n-1} (1+z-1/(1+z)^{n}) \; dz \\ = [z^n] (1+z)^{2n} - [z^n] (1+z)^{n-1} = {2n\choose n}.$$

The second sum yields

$$- \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+2}} (1+z)^{2n-1} (1+z-1/(1+z)^{n}) \; dz \\ = - ([z^{n+1}] (1+z)^{2n} - [z^{n+1}] (1+z)^{n-1}) = -{2n\choose n+1}.$$

We get

$$\left(1-\frac{n}{n+1}\right) {2n\choose n} = \frac{1}{n+1} {2n\choose n}$$

and we may conclude that the probabilities do indeed sum to one. The evaluation of binomial coefficient sums using the Cauchy Residue Theorem is known as the Egorychev method.