Bernoulli estimation under different parameter representation

115 Views Asked by At

Suppose that $Y_1,\ldots,Y_n\stackrel{\text{iid}}{\sim} \text{Ber}(p)$. I would like to estimate $c = g(p) = (1-p)/p$. Naively one can derive $$\tilde{c}_\text{mle} = g(\tilde{p}_\text{mle}) = g\left(\frac{1}{n} \sum_i Y_i \right) = \frac{n-\sum_i Y_i}{\sum_i Y_i}$$ and in order for that to make any sense I set $$\tilde{c} = \mathbf{1}_{\{\sum_i Y_i > 0\}}\frac{n-\sum_i Y_i}{\sum_i Y_i}.$$ Now I'm interested in bounding $\mathbb{E}|c-\tilde{c}|$ from above, but things like Hoeffding's Inequality don't seem to work. One other thing I was thinking about was to write out $\mathbb{E}|c-\tilde{c}|$ as a sum since $\tilde{c}$ can only be a finite amount of distinct values but I don't see this expression making it easier. Another difficulty is that $g'(p)$ isn't bounded, so approaches similar to the Delta Method won't work as well. I thought there would be a bit more literature about such issues, but I haven't managed to find any of it. Is anyone familiar with such problems? Pointers are also appreciated!

2

There are 2 best solutions below

1
On

I'm interested in bounding $\mathbb{E}|c-\tilde{c}|$ from above

Here $\tilde{c}$ is a function of $S_n=\sum_{1\le i\le n} Y_i$, say $\tilde{c}=f(S_n)$. The following steps give an upper bound in terms of the first two moments of $\tilde{c}$:

  1. $\mathbb{E}|c-\tilde{c}|\ \le\ \sqrt{\mathbb{E}[(c-\tilde{c})^2]}$

    (because for any r.v. $X$, $0 \le \mathbb{E}[(|X|-E|X|)^2] = \mathbb{E}[|X|^2]-[\mathbb{E}|X|]^2$)

  2. $\mathbb{E}[(c-\tilde{c})^2]\ =\ \mathbb{E}[(\tilde{c}-\mathbb{E}\tilde{c})^2]\ +\ (c-\mathbb{E}\tilde{c})^2\ =\ \mathbb{E}[\tilde{c}^2]-[\mathbb{E}\tilde{c}]^2\ +\ (c-\mathbb{E}\tilde{c})^2$

    (write $(c-\tilde{c})^2=\big((c-\mathbb{E}\tilde{c})+(\mathbb{E}\tilde{c}-\tilde{c})\big)^2$, expand and take the expectation)

  3. $\mathbb{E}\tilde{c}\ =\ \sum_{0\le k\le n}f(k)\,\binom{n}{k}\,p^k\,(1-p)^{n-k}$

    $\mathbb{E}[\tilde{c}^2]\ =\ \sum_{0\le k\le n}[f(k)]^2\,\binom{n}{k}\,p^k\,(1-p)^{n-k}.$

These moments may be intractable for your choice of $f()$; however, there are other -- perhaps more reasonable -- choices for $f()$ that do give some "closed form" results (at least for the first moment). For example, setting $h=1$ in the following estimator will lead to a simple closed form for $\mathbb{E}\hat{c}$ for any $t\ge 0$: $$\hat{c}\ \ =\ \frac{1-\frac{h+S_n}{h+t+n}}{\frac{h+S_n}{h+t+n}}\ =\ \frac{n+t-S_n}{h+S_n}. $$ (This $\hat{c}$ is an inituitive estimator that corresponds to tossing a $p$-coin $n$ times in the future after already having observed $h$ Heads and $t$ Tails in a past sequence of $h+t$ trials.)


UPDATE:

The 1985 paper, Bounds on Reciprocal Moments with Applications and Developments in Stein Estimation and Post-Stratification, by David A. Wooff, proves the following upper bound as a corollary to its main theorem:

If $X\ge 0$ is a random variable such that $V[X] \le E[X](E[X]+a)$ for a constant $a>0$, then for any positive integer $k$, $$V[(X+a)^{-k}]\le \left(\frac{\gamma}{a}\right)^2\,V[X]$$ where $$\gamma = \frac{E[X]}{V[X]+E[X](E[X]+a)}.$$

For any $a\ge 1$, this corollary applies to $X=S_n$ above, since $E[S_n]=np$ and $V[S_n]=np(1-p)\le np(np+a)$ (because then $1-p\le np+a$, i.e. $1-a\le 0\le (n+1)p$).

Example

Take $h=1$ and any $t\ge 0$ in our estimator $\hat{c}$: $$\hat{c} = \frac{n+t-S_n}{S_n+1} = \frac{n+t+1}{S_n+1}-1.$$ Then (omitting the derivation), exact summation gives $$E[\hat c]=(n+t+1)E[S_n+1)^{-1}] - 1=\frac{n+t+1}{n+1}\cdot\frac{1-(1-p)^{n+1}}{p}-1, $$ and hence (again omitting some steps) $$c-E[\hat c] = \frac{1-p}{p}-\frac{n+t+1}{n+1}\cdot\frac{1-(1-p)^{n+1}}{p}+1 = \frac{(n+t+1)(1-p)^{n+1}-t}{(n+1)p}. $$ The corollary gives (with $a=1$) $$V[\hat c] = (n+t+1)^2\, V[(S_n+1)^{-1}] \le (n+t+1)^2\,\gamma^2\,np(1-p)\ < \frac{(n+t+1)^2}{n}\cdot\frac{1-p}{p}$$ because $\gamma = \frac{1}{(1-p)+np+1}\lt \frac{1}{np}.$

Using these results in steps 1. and 2. above, we obtain the following, after some simplification:

$$\begin{align} E|c-\hat{c}| &\le \sqrt{ V[\hat c] + (c-E[\hat c])^2 } \\ &< \frac{n+t+1}{\sqrt{n}}\sqrt{ c + \frac{n}{(n+1)^2}\left( c\,(1-p)^n-\frac{1}{p}\frac{t}{n+t+1} \right)^2} \sim \sqrt{nc}. \end{align}$$

In the special case of $t=0$, this becomes $$E|c-\hat{c}| < \frac{n+1}{\sqrt{n}}\sqrt{c + \frac{n}{(n+1)^2}c^2(1-p)^{2n} }\sim \sqrt{nc}. $$ (Asymptotically in $n$, this is a tighter upper bound than the one obtained in the OP's answer, as that one appears to be $\sim n\sqrt{2\pi}$ -- if I made no mistakes doing the integration.)

NB: Wooff's paper includes some tighter bounds specific to the Binomial distribution, but I haven't looked into them.

0
On

With the pointers from r.e.s. I proceeded as below. I worked with a different $\tilde{c}$ that is indeed a function of $S_n:=\sum_{i=1}^n Y_i$. For starters, I wanted $\mathbb{E}\tilde{c}$ to be close to $c=(1-p)/p$ and one way to do so is to define $$\tilde{c} = f(S_n) = \left.\binom{n}{S_n+1}\right/\binom{n}{S_n} = \frac{n-S_n}{S_n+1}.$$ In this way; $$\mathbb{E}\tilde{c} = \sum_{k=0}^n f(k) \binom{n}{k} p^k (1-p)^{n-k} = \sum_{k=0}^n \binom{n}{k+1} p^k (1-p)^{n-k}$$ $$=\sum_{k=1}^{n+1} \binom{n}{k} p^{k-1} (1-p)^{n-k+1}= \frac{1-p}{p}\sum_{k=1}^n \binom{n}{k} p^{k} (1-p)^{n-k}$$ $$=c\left(1-(1-p)^n\right).$$ Then we proceed to derive a result similar to Hoeffding's Inequality. We get by the Chernoff bound and Hoeffding's Lemma applied to $X:=\tilde{c}-\mathbb{E}\tilde{c} = \tilde{c}-c + c(1-p)^n$. Note that $\tilde{c}$ lies in $[0,n]$ and the Chernoff bound only takes into account the length of the interval in which $X$ lies. Then $$\mathbb{P}(\tilde{c}-c\geq t) = \mathbb{P}(\exp(s(\tilde{c}-c))\geq e^{st})$$ $$\leq e^{-s(t+c(1-p)^n)}\mathbb{E}e^{sX}\leq \exp\left(\frac{s^2n^2}{8} - s(t+c(1-p)^n)\right).$$ The quadratic function within can be minimized at $$s = \frac{4(t+c(1-p)^n)}{n^2}$$ which results in the upper bound $$\mathbb{P}(\tilde{c}-c\geq t) \leq\exp\left(-\frac{2}{n^2}\big(t+c(1-p)^n\big)^2\right).$$ Similarly; $$\mathbb{P}(c-\tilde{c}\geq t) \leq\exp\left(-\frac{2}{n^2}\big(t-c(1-p)^n\big)^2\right).$$ Combining both results we find $$\mathbb{E}|\tilde{c}-c|= \int_0^\infty\mathbb{P}(|\tilde{c}-c|\geq t)dt$$ $$\leq \int_{-c(1-p)^n}^{c(1-p)^n}e^{-2t^2/n^2}dt + 2\int_{c(1-p)^n}^{n-c} e^{-2t^2/n^2}dt.$$ This is however in no way a good upper bound, so I hope this can be upgraded some way.