Estimating the Bias of a Coin using Chebyshev's inequality

860 Views Asked by At

Assume we have a coin of unknown bias towards heads $p$ and our estimate of the bias is $\hat{p} = \frac{1}{n}S_n$ where $S_n$ is the number of heads observed.

To find the expectation of $E(\hat{p})$ we can use indicator random variables to define $S_n$. $S_n = X_1 + ... X_n$ where $X_i = 1$ if the $i$-th toss is heads and $0$ otherwise. That means that $E(X_i) = 1\times Pr[X_i=1]$

By linearity of expectation, we can compute $E(\hat{p})$.

$E(\hat{p}) = E(\frac{1}{n}S_n) = E(\frac{1}{n}X_1+...\frac{1}{n}X_i) = \frac{1}{n}\sum_{i=1}^nE(X_i) = p$

I understand this up to the $=p$ part. I understand that $p$ should be the number of observed heads over the number of total tosses. How does $\sum_{i=1}^nE(X_i)$ represent the number of observed heads? Sure, it represents $E(S_n)$, or the expected number of observed heads, but that isn't definite right? Anyway, the book goes on to compute $Var(\hat{p})$

$Var(\hat{p}) = Var(\frac{1}{n}S_n) = \frac{1}{n^2}Var(S_n) = \frac{1}{n^2}\sum_{i=1}^n Var(X_i) = \frac{\sigma^2}{n}$

This part I understand even less, where did $\frac{1}{n^2}$ come from? How was that pulled out of the original variance?

The book then applies Chebyshev's inequality to compute the confidence.

$Pr[|\hat{p} - p] \geq \epsilon] \leq \frac{Var(\hat{p})}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}$

I understand how the values are plugged into Chebyshev's inequality and how $Var(\hat{p})$ gets written as $\sigma^2$, but where did the $n$ come from? And just to confirm, the equals sign is only saying that the right hand side of the inequality is equal to that, right, not the whole expression?

The book then takes a step in saying that "to make this less than the desired value $\delta$, we need to set $n \geq \frac{\sigma^2}{\epsilon^2\delta}$

Perhaps this is because I don't understand the previous steps, but I don't see how $\epsilon$ and $\delta$ are controlling the error and confidence respectively.

EDIT:

Thanks to the below responses, my doubts expressed above are cleared. I would like to follow up on how my book solves the problem of estimating the bias of a coin.

I understand everything up to the point

$n \geq \frac{\sigma^2}{\epsilon^2\delta}$

$\sigma^2$ can be written as $Var(X_i)$.

$Var(X_i) = E(X_i^2) - E(X)^2$ which comes out to be $p - p^2 = p(1-p)$.

So our expression above can be rewritten as

$n \geq \frac{p(1-p)}{\epsilon^2\delta}$

My book contains the following extra part:

Since $p(1-p)$ takes on its maximum value for $p=\frac{1}{2}$, we can conclude that it is sufficient to choose $n$ such that:

$n \geq \frac{1}{4\epsilon^2\delta}$

I understand that the last expression comes from substituting $p = \frac{1}{2}$, but I don't understand the claim they make about the maximum value.

2

There are 2 best solutions below

9
On BEST ANSWER
  1. How does $\sum_{i=1}^nE(X_i)$ represent the number of observed heads?

    It doesn't. It represents the expectation of $S_n$. $X_1+\dotsb+X_n$ represents the number of heads since this sum is equal to $S_n$.

  2. It is well-known that $\operatorname{Var}(cX) = c^2 \operatorname{Var}(X)$: $$\operatorname{Var}(cX) = E[(cX-E[cX])^2] = E[(cX-cE[X])^2] = E[c^2(X-E[X])^2] = c^2\operatorname{Var}(X).$$

  3. It was shown that $\operatorname{Var}(\hat p ) = \frac{\sigma^2}{n}$. Then $$\frac{\operatorname{Var}(\hat p)}{\epsilon^2} = \frac{\sigma^2/n}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}.$$ Yes the equality is just for this part.

  4. The book says if we want this $$\frac{\sigma^2}{n\epsilon^2} \leq \delta,$$ for a given $\delta$, then do this $$\frac{\sigma^2}{\delta\epsilon^2} \leq n.$$ In plain words, if we want $\operatorname{Var}(\hat p)/\epsilon^2$ to be less than some value $\delta$, then $n$ should satisfy the condition above.

2
On

You wrote: "I understand that $p$ should be the number of observed heads over the number of total tosses." No, this is not correct. $p$ is the expected fraction of tosses that will be heads. It is not a sample statistic. Same goes for the variance.

As for the $n$, it comes from the the previous calculation of $Var(\hat{p})$. Note that $Var(aX) = a^2Var(X)$, so you can see how we get $\frac{1}{n^2}$ and how this gets plugged into $Var(\hat{p})$.

Finally, yes, the $=$ is referring just to the two RHS expressions, not to the full line.