Flajolet & Sedgewick: How to compute the variance of the number of cycles in a random permutation?

217 Views Asked by At

I am reading the book Analytic Combinatorics 4ed by Sedgewick and Flajolet. On page 160 at Example III.4 the authors derive the variance of the number of cycles in a random permutation. I can follow the authors up to the part where the write

$$\sigma_n^2 = \biggl(\sum_{k=1}^n \frac{1}{k} \biggr) - \biggl(\sum_{k=1}^n \frac{1}{k^2} \biggr). $$

I do not understand how they arrive at that. If I understand the above part correctly the authors state that $\mathbb{E}_n[\chi] = H_n$, where $H_n$ is the $n$-th Harmonic Number, and $\mathbb{E}_n[\chi(\chi-1)] = \sum_{k=1}^n \frac{1}{k^2}$.

I think that it would now suffice to compute

$$\mathbb{E}_n[\chi^2] - \mathbb{E}_n[\chi]^2$$

by somehow utilising the above. However, I do not understand how to do that. Could you please help me?

Could you please explain this to me?

2

There are 2 best solutions below

3
On BEST ANSWER

I only have the first edition of Analytic Combinatorics. In that edition, and in that example, the authors derive the following generating function equation: $$ \sum_{n\ge 0} \mathbb E_n[\chi(\chi-1)]z^n=\frac1{1-z}\left(\log \frac1{1-z}\right)^2.\tag{11} $$ They then say

From this expression and Note 4 (or directly from the Stirling polynomials), a calculation shows that $$ \sigma_n^2= \left(\sum_{k=1}^n \frac1k\right)-\color{red}{\left(\sum_{k=1}^n \frac1{k^2}\right)}\tag{12} $$

I could not find Note 4, so let us try to replicate this calculation without it.

First, we use $(11)$ to determine a formula for $E_n[\chi(\chi-1)].$ Since $\log \frac1{1-z}=\frac z1+\frac{z^2}2+\frac{z^3}3+\dots$, it follows that $$ \left(\log \frac1{1-z}\right)^2=\sum_{n\ge 2} \left(\sum_{i=1}^{n-1} \frac1i\cdot \frac1{n-i} \right)z^n $$ which then implies that $$ \begin{align} E_n[\chi(\chi-1)] &=[z^n]\frac1{1-z}\left(\log \frac1{1-z}\right)^2 \\&=\sum_{k=0}^n [z^k]\left(\log \frac1{1-z}\right)^2 \\&=\sum_{k=2}^n\sum_{i=1}^{k-1}\frac1{i}\cdot \frac1{k-i} \\&=\sum_{i+j\le n}\frac1i \cdot \frac1j \end{align} $$ Therefore, we conclude that \begin{align} \sigma_n^2 &=E_n[\chi^2]-(E_n[\chi])^2 \\&=E_n[\chi]+E_n[\chi(\chi-1)]-(E_n[\chi])^2 \\&=H_n+\left(\sum_{i+j\le n}\frac1i \cdot \frac1j\right)-\left(\sum_{i=1}^n \frac1i\right)^2 \\&=H_n+\left(\sum_{i+j\le n}\frac1i \cdot \frac1j\right)-\left(\sum_{i=1}^n\sum_{j=1}^n\frac1i \cdot \frac1j\right) \\&= H_n-\color{blue}{\left(\sum_{i=1}^n\sum_{j=1}^n\frac1i \cdot \frac1j\cdot 1[i+j>n]\right)} \end{align} where $1[i+j>n]$ is $1$ when $i+j>n$, and zero otherwise. Equivalently, this equals $\sum_{i=1}^n\sum_{j=n-i+1}^n \frac1i\cdot \frac1j$.

This almost looks like $(12)$. We would be done if we could prove $$ \color{red}{\sum_{k=1}^n \frac1{k^2}}=\color{blue}{\sum_{i=1}^n\sum_{j=1}^n\frac1i \cdot \frac1j\cdot 1[i+j>n]},\tag{$\star$} $$ Thanks to Matthew Towers's helpful comment, we can give a simple proof of this by induction on $n$. Let $S_n:=\sum_{i=1}^n\sum_{j=1}^n\frac1i \cdot \frac1j\cdot 1[i+j>n]$ be the RHS of $(\star)$. The inductive step boils down to proving $$ S_n-S_{n-1}\stackrel{?}=\frac1{n^2} $$ In the difference $S_n-S_{n-1}$, there are a lot of common terms. It helps to view $S_n$ as a sum of $1/(ij)$ over the discrete triangle $\{(i,j)\in \mathbb N^2\mid 1\le i,j\le n,i+j>n\}$, and to notice that the triangle for $n$ has a large overlap with the triangle for $n-1$. When these common terms are canceled, what remains is $$ S_n-S_{n-1}= \frac1{n^2} +\left(\sum_{i=1}^{n-1} \frac1{i}\cdot \frac1n\right) +\left(\sum_{i=1}^{n-1} \frac1n\cdot \frac1{i}\right) -\left(\sum_{i=1}^{n-1} \frac1{i}\cdot \frac1{n-i}\right) $$ Using the identity $\frac1i\cdot \frac1{n-i}=\frac1{ni}+\frac1{(n-i)n}$, it is easy to prove that the above expression simplifies to $1/n^2$.

1
On

Probably not exactly what you were looking for, but the calculations are a bit easier:

  • Law of total variance with $Y=C_n$, the number of cycles in a uniform permutation on $n$ elements and $X=\mathbf 1\{n\text{ is a fixed point}\}$ gives $$\text{Var}(C_n)=\text{Var}(\mathbb E[C_n\mid X])+\mathbb E[\text{Var}(C_n\mid X)].$$
  • By the Chinese restaurant construction, $\mathbb E[C_n\mid X]$ is just $\mathbb E C_{n-1}$ with probability $1-{1\over n}$ and $1+\mathbb E C_{n-1}$ with probability $1\over n$; thus its variance is that of a Bernoulli random variable with success rate $1\over n$, i.e.: $$\text{Var}(\mathbb E[C_n\mid X])={1\over n}-{1\over n^2}.$$
  • Analogously: $$\mathbb E[\text{Var}(C_n\mid X)]={1\over n}\text{Var}(1+C_{n-1})+\left(1-{1\over n}\right)\text{Var}(C_{n-1})=\text{Var}(C_{n-1}).$$
  • Combining the above, we get $$\text{Var}(C_n)=\text{Var}(C_{n-1})+{1\over n}-{1\over n^2}=\dots=\sum_{k=1}^n\left({1\over k}-{1\over k^2}\right),$$ as desired.

While this might not look as general/reusable as the above, the law of total variance comes in handy whenever you have a way of recursively generating a random object (e.g. random permutations via the Chinese restaurant process here, or fractals) and want to know the variance of some observable.