Mean and variance of a binomial random variable wrt $n$ when $p$ is a function of $n$

190 Views Asked by At

I would appreciate any suggestions regarding the following.

For a constant $c>0$: $$f(n,p;\,c):=\sum_{k=1}^{n}\binom{n-1}{k-1}(p)^{k-1}(1-p)^{n-k}\cdot\frac{1}{k}-c$$ and let $p^{*}=p^{*}(n;\,c)$ be the unique solution to the equation $f(n,p^{*};\,c)=0$. (Assume that $c$ is in the suitable range such that we find a non-trivial $p^{*}\in(0,1)$).

How does $\text{var}(n)=n\cdot p^{*}\left(n\right)\cdot(1-p^{*}(n))$ vary with $n$?

What I have done so far:

First, we may simplify the expression to $$f(n,p)=\{ 1-(1-p)^{n}\} \cdot\frac{1}{n\cdot p}-c.$$ Treating $n$ as a continuous variable, due to e.g. coupling we have $\frac{\partial f}{\partial n}(n,p)<0$ and $\frac{\partial f}{\partial p}(n,p)<0$ for all $n$ and $p\in(0,1)$. Then, by the inverse function theorem (IFT) we have $$\frac{dp^{*}}{dn}=-\frac{\frac{\partial f}{\partial n}(n,p^{*})}{\frac{\partial f}{\partial p^{*}}(n,p^{*})}<0$$

Then, setting $p:=\frac{\overline{n}}{n}$ into function $f$ we get the expression $$f\left(n,\frac{\overline{n}}{n}\right)=\left\{ 1-\left(1-\frac{\overline{n}}{n}\right)^{n}\right\}\cdot\frac{1}{\overline{n}}-c$$ and we may define $\overline{n}^{*}=n\cdot p^{*}$ as the solution to the equation $f\left(n,\frac{\overline{n}^{*}}{n}\right)=0$. Similarly, using the IFT I was able to show that $\overline{n}^{*}\left(n\right)=n\cdot p^{*}\left(n\right)$ decreases with $n$.

However, I am not sure how to understand the dependence of $\text{var}(n)=n\cdot p^{*}(n)\cdot(1-p^{*}(n))$ wrt $n$. Thanks!

1

There are 1 best solutions below

9
On

$\def\e{\mathrm{e}}\def\paren#1{\left(#1\right)}$Define $F(p; c, n) = (1 - p)^n + cnp - 1$ for $0 < p < 1$, $c > 0$, $n > 1$. Note that $\lim\limits_{p → 0+} F(p; c, n) = 0$ and\begin{align*} \frac{\partial F}{\partial p}(p; c, n) &= -n(1 - p)^{n - 1} + cn,\\ \frac{\partial^2 F}{\partial p^2}(p; c, n) &= n(n - 1)(1 - p)^{n - 2} > 0, \end{align*} thus $F$ is convex with respect to $p$. Therefore for any $c > 0$ and $n > 1$, the following conditions are equivalent:

  1. There exists a unique $p \in (0, 1)$ such that $F(p; c, n) = 0$.
  2. $\lim\limits_{p → 1-} F(p; c, n) = cn - 1 > 0$, $\lim\limits_{p → 0+} \dfrac{\partial F}{\partial p}(p; c, n) = (c - 1)n < 0$, i.e. $0 < c < 1$ and $n > \dfrac{1}{c}$.

Now for any $0 < c < 1$ and $n > \dfrac{1}{c}$, denote by $p(c, n)$ the unique solution to $F(p; c, n) = 0$, i.e.$$ F(p(c, n); c, n) \equiv 0.\quad \forall 0 < c < 1,\ n > \frac{1}{c} $$ Here is an interactive graph of $D(c, n) = np(c, n)(1 - p(c, n))$ in Sage. It can be seen from the graph that for $c$ not too close to $1$, $D(c, n)$ is indeed increasing with respect to $n$, but not so for $c$ close to $1$ (even if $n$ is restricted to be in $\mathbb{N}_+$).

To be more explicit, considers the cases for $n = 2$ and $n = 3$. Since\begin{align*} F(p; c, 2) &= (1 - p)^2 + 2cp - 1 = p(p - 2(1 - c)),\\ F(p; c, 3) &= (1 - p)^3 + 3cp - 1 = -p(p^2 - 3p + 3(1 - c)), \end{align*} then\begin{align*} p(c, 2) &= 2(1 - c), & D(c, 2) &= 4(1 - c)(2c - 1),\\ p(c, 3) &= \frac{1}{2} (3 - \sqrt{12c - 3}), & D(c, 3) &= 3\sqrt{12c - 3} - 9c. \end{align*} Wolfram shows that$$ D(c, 2) < D(c, 3) \iff \frac{1}{2} < c < 0.82537575\cdots, $$ which means that $D(c, 2) > D(c, 3)$ for $c$ close enough to $1$.


Although $D(c, n)$ is not always increasing with respect to $n$, the following proposition shows that $\lim\limits_{n → ∞} D(c, n)$ exists for any $0 < c < 1$.

Proposition: Define $G(t; a, c) = a\e^{-t} + ct - 1$ for $t > 0$, $a > 0$, $0 < c < 1$. For any $0 < c < 1$, the equation $G(t; 1, c) = 0$ has a unique solution $t^*$, and$$ \lim_{n → ∞} np(c, n) = t^*. $$

Proof: Consider a fixed $c_0$ and omits $c_0$ where possible for brevity. Note that\begin{gather*} \lim_{t → 0+} G(t; 1) = 0,\quad \lim_{t → +∞} G(t; 1) = +∞,\\ \lim_{t → 0+} \frac{\partial G}{\partial t}(t; 1) = c - 1 < 0,\quad \frac{\partial^2 G}{\partial t^2}(t; 1) = \e^{-t} > 0, \end{gather*} thus the convexity of $G(\,·\,; 1)$ implies that there exists a unique $t^*$ such that $G(t^*; 1) = 0$. Moreover, because $G$ is $C^1$ in a neighborhood of $(t^*; 1)$ and$$ \frac{\partial G}{\partial t}(t^*; 1) = c_0 - \e^{-t^*} = \frac{1}{t^*} (1 - \e^{-t^*}) - \e^{-t^*} = \frac{1}{t^* \e^{t^*}} (\e^{t^*} - t^* - 1) ≠ 0, $$ so the implicit function theorem shows that there exists $δ_a, δ_t > 0$, $I_a = (1 - δ_a, 1 + δ_a)$, $I_t = (t^* - δ_t, t^* + δ_t)$, and $t(a) \in C^1(I_a)$ such that $t(1) = t^*$ and\begin{gather*} G(t; a) = 0 \iff t = t(a).\quad \forall (t; a) \in I_t × I_a \tag{1} \end{gather*}

Now, since $F(p(n); n) > c_0 np - 1$, then $p(n) < \dfrac{1}{c_0 n}$ for any $n$, thus $p(n) = O\paren{ \dfrac{1}{n} }$ as $n → ∞$, which implies that$$ \ln(1 - p(n)) = -p(n) + o(p(n)) = -p(n) + o\paren{ \frac{1}{n} }. $$ Note that\begin{gather*} 0 = F(p(n); n) = \exp(n \ln(1 - p(n))) + c_0 np(n) - 1\\ = a(n) \exp(-s(n)) + c_0 s(n) - 1 = G(s(n); a(n)), \end{gather*} where $a(n) = \exp(n \ln(1 - p(n)) + np(n))$, $s(n) = np(n)$. Because$$ a(n) = \exp(o(1)) = 1 + o(1), $$ so (1) implies that $s(n) = t(a(n))$ for sufficiently large $n$, and$$ \lim_{n → ∞} np(n) = \lim_{n → ∞} t(a(n)) = t(1) = t^*. $$