Does the (normalized) product of two independent binomial variables converges in distribution to a normal variable?

239 Views Asked by At

(After 10 days, I posted this question also on MO.)

Let $X$ and $Y$ be two independent identically distributed binomial random variables with parameters $n \in \mathbb{N}$ and $p \in (0,1)$. Let $Z := XY$ be their product.

Is it true or false that $\tilde{Z} := (Z - \mathbf{E}[Z]) / \sqrt{\mathbf{Var}[Z]}$ converges in distribution to a standard normal random variable (as $n \to \infty$) ?

At a first glance, I would be tempted to write $X = \sum_{i=1}^n A_i$ and $Y = \sum_{i=1}^n B_i$, where $A_i$ and $B_i$ are Bernoulli random variables, and then to apply the central limit theorem to $Z = \sum_{i=1}^n \sum_{j = 1} A_i B_j$... but $A_i B_j$ are not independent... Thanks for any help

P.S. It is easy to check that $\mathbf{E}[Z] = n^2 / 4$ and $\mathbf{Var}[Z] = n^3 / 8 + n^2 / 16$. Moreover, expanding $(XY - n^2/4)^k$ with the binomial theorem and using the formula for the moments of the binomial distribution, I got that $$\mathbf{E}\left[\tilde{Z}^k\right] = \frac1{(n^3 / 8 + n^2 / 16)^{k/2}} \sum_{j=0}^k \binom{k}{j} \left(\sum_{i=0}^j \left\{\begin{matrix}j \\ i\end{matrix} \right\} (n)_{i} (1/2)^i\right)^2 (-n^2/4)^{k-j}$$ where $\left\{\begin{matrix}j \\ i\end{matrix}\right\}$ are Stirling numbers of second kind and $(n)_{i}$ is a falling factorial. With this formula, I verified that the first 20 moments of $\tilde{Z}$ tend to the moments of a standard normal variable. However, I still do not know how to prove this for all moments.

1

There are 1 best solutions below

4
On BEST ANSWER

The answer is yes.

I denote $X_n$, $Y_n$ and $Z_n$ instead of $X$, $Y$ and $Z$ to highlight their dependence in $n$. One can readily show with the central limit theorem that $\frac{X_n-np}{\sqrt{np(1-p)}}$ converges in distribution to the standard normal distribution. Then by independence we have that $$ W_n:=\frac{X_n-np}{\sqrt{np(1-p)}}\times\frac{Y_n-np}{\sqrt{np(1-p)}}=\frac{X_nY_n-npX_n-npY_n+(np)^2}{np(1-p)}\overset{d}{\underset{n\to+\infty}{\longrightarrow}}G\times G', $$ where $G$ and $G'$ are two independent gaussian variables with mean $0$ and variance $1$.

It is easy to show that $\mathbb E[X_nY_n]=(np)^2$ and $\operatorname{Var}(X_nY_n)=(np)^2((1-p)^2+2np(1-p))$. Then with some rearrangement of the terms, essentially $$ X_nY_n-(np)^2=X_nY_n-npX_n-npY_n+(np)^2+np(X_n-np)+np(Y_n-np), $$ we easily find $$ \begin{align*} \tilde Z_n&=\frac{Z_n-\mathbb E[Z_n]}{\operatorname{Var}(Z_n)}\\ &=\frac{W_n}{\sqrt{1+\frac{2np}{1-p}}}+\frac{1}{\sqrt{2+\frac{1-p}{np}}}\left(\frac{X_n-np}{\sqrt{np(1-p)}}+\frac{Y_n-np}{\sqrt{np(1-p)}}\right)\\ &\overset{d}{\underset{n\to+\infty}{\longrightarrow}}\frac{1}{\sqrt2}(G+G'). \end{align*} $$

Since $G,G'$ are independent standard gaussian variables, $G+G'$ is a gaussian variable, with mean $0$ and variance $2$, so $(G+G')/\sqrt2$ is a standard gaussian variable. Therefore the answer to your question is yes.

EDIT: To answer to your comment I give more details about how we derive the last convergence. This might look like I had hid a lot of details, but this is because it is very standard reasoning.

Let $\varepsilon_n=\frac{1}{\sqrt{1+\frac{2np}{1-p}}}$, $\eta_n=\frac{1}{\sqrt{2+\frac{1-p}{np}}}$, $S_n=\frac{X_n-np}{\sqrt{np(1-p)}}$ and $T_n=\frac{Y_n-np}{\sqrt{np(1-p)}}$, so that $$ \tilde Z_n=\varepsilon_nW_n+\eta_n(S_n+T_n), $$ where $\epsilon_n\underset{n\to+\infty}{\longrightarrow}0$, $\eta_n\underset{n\to+\infty}{\longrightarrow}\frac{1}{\sqrt2}$, $W_n\overset{d}{\underset{n\to+\infty}{\longrightarrow}}G\times G'$, $S_n\overset{d}{\underset{n\to+\infty}{\longrightarrow}}G$ and $T_n\overset{d}{\underset{n\to+\infty}{\longrightarrow}}G'$.

Because $S_n$ and $T_n$ are independent, and $G$ and $G'$ are independent, you have that $(S_n,T_n)\overset{d}{\underset{n\to+\infty}{\longrightarrow}}(G,G')$. By continuity of the sum, $S_n+T_n\overset{d}{\underset{n\to+\infty}{\longrightarrow}}G+G'$. Since $\eta_n$ is deterministic and converges to $1/\sqrt2$, we have that $\eta_n(S_n+T_n)\overset{d}{\underset{n\to+\infty}{\longrightarrow}}\frac{1}{\sqrt2}(G+G')$.

On the other hand, $W_n\overset{d}{\underset{n\to+\infty}{\longrightarrow}}G\times G'$, and since $\epsilon_n$ is deterministic and vanishes, we have that $\epsilon_nW_n\overset{d}{\underset{n\to+\infty}{\longrightarrow}}0\times G\times G'=0$. Because the latter limit in distribution is deterministic, Slutsky's lemma implies that $(\epsilon_nW_n,\eta_n(S_n+T_n))\overset{d}{\underset{n\to+\infty}{\longrightarrow}}(0,\frac{1}{\sqrt2}(G+G'))$. By continuity of the sum again, we deduce that $$ \tilde Z_n=\varepsilon_nW_n+\eta_n(S_n+T_n)\overset{d}{\underset{n\to+\infty}{\longrightarrow}}0+\frac{1}{\sqrt2}(G+G')=\frac{1}{\sqrt2}(G+G'). $$

EDIT 2: I just wanted to mention that the result could actually have been intuited from the following loose reasoning: since $\frac{X_n-np}{\sqrt{np(1-p)}}$ converges in distribution to the standard normal distribution, then $X_n$ is approximately distributed according to the normal distribution with mean $np$ and variance $np(1-p)$. This can be rewritten as $X_n\simeq np+\sqrt{np(1-p)}G$, where $G$ follows the standard normal distribution. Similarly we can write $Y_n\simeq np+\sqrt{np(1-p)}G'$, where $G$ follows the standard normal distribution and is independent of $G$. Then $$ \begin{align*} \tilde Z_n&=\frac{X_nY_n-(np)^2}{np\sqrt{(1-p)^2+2np(1-p)}}\\ &\simeq\frac{\sqrt{np(1-p)}(G+G')}{\sqrt{(1-p)^2+2np(1-p)}}+\frac{(1-p)G\times G'}{\sqrt{(1-p)^2+2np(1-p)}}\\ &\underset{n\to+\infty}{\longrightarrow}\frac{1}{\sqrt2}(G+G'). \end{align*} $$

My answer above is finally just a way to treat formally that "$\simeq$" symbol.

EDIT 3: I forgot to mention it in my first answer, I added the fact that $(G+G')/{\sqrt2}$ is a standard gaussian variable, so the answer to your question is yes.