Using Chernoff's inequality

142 Views Asked by At

Let $X$ denote the number of pairs of $HH$ in a sequence of 1000 coin tosses where we also consider the pair of the last and the first coin toss (cyclic). I'm asked to use Chernoff's inequality (the one which is only applicable for bernoulli variables) so I can't do that directly since $X$ is not distributed binomially. The tip is to split $X$ into pairs of tosses $i$ and $i+1$ where one time $i$ is even and the other time $i$ is odd. Clearly, those two expirements are distributed binomially, but how can I use this fact to use Chernoff's inequality to estimate something of the form $P(X\geq (1 + \delta)E[X])$ whereby I know that $X = X_{i \text{ is even}} + X_{i \text{ is odd}}$ so $E[X] = E[X_{i \text{ is even}}] + E[X_{i \text{ is odd}}]$.

1

There are 1 best solutions below

0
On BEST ANSWER

Taking your hint. $$X=X_{even}+X_{odd}$$

with $X_{even},X_{odd}\sim\text{Binomial}(n/2, 1/4)$ (as there are n/2 trials of 1/4 chance each of HH). However they are not independent, but we can use linearity of expectation to find $E(X)=n/8+n/8=n/4$. And so $E(\bar X)=1/4$. We are interested in bounding $$\Pr\left(\bigg|\bar X-\frac 1 4\bigg|\ge t\right)$$ for some small $t$ such as $t=\frac 1 {10}$. This can be bounded by Chebyshev's Inequality but Chernoff Bounds are better for large $n$. Multiplying both sides by $n$ we get

$$\begin{split}\Pr\left(\big|X-\frac n 4\big|\ge nt\right)&=\Pr\left(X-\frac n 4\ge n t\right)+\Pr\left(-(X-\frac n4)\ge nt\right)\\ &=\Pr\left(X_{even}+X_{odd}-\frac n8-\frac n8\ge nt\right)+\Pr\left(-\left(X_{even}+X_{odd}-\frac n8-\frac n8\right)\ge nt\right)\\ &\le\Pr\left(X_{even}-\frac n8\ge \frac{nt}2\right)+\Pr\left(X_{odd}-\frac n 8\ge \frac{nt}2\right)+\Pr\left(-\left(X_{even}-\frac n 8\right)\ge \frac{nt}2\right)+\Pr\left(-\left(X_{odd}-\frac n 8\right)\ge\frac{nt}2\right)\\ &=2\left[\Pr\left(X_{even}-\frac n8\ge \frac{nt}2\right)+\Pr\left(-\left(X_{odd}-\frac n 8\right)\ge\frac{nt}2\right)\right]\end{split}$$

Let $Y=X_{even}-\frac n 8$, then the Chernoff bound says that $\Pr\left(Y\ge\frac{nt}2\right)\le\min_{s>0}\psi(s)\exp(-nts/2)$. We can find the mgf of $Y$ by using the mgf of a binomial random variable and a property of a mgf added to a constant to get $\psi(s)=\left(\frac 1 4 e^s+\frac 34\right)^{n/2}e^{-\frac n8s}$ and so we want to minimize $\left[\left(\frac 1 4e^s+\frac 3 4\right)e^{-s(1/4+t)}\right]^{n/2}$. This has a minimum at $s=\log\frac{-12t-3}{4t-3}$.

Likewise we want to find a bound for $\Pr(-Y\ge \frac {nt}2)$, in fact the mgf of $-Y$ is $\psi(-s)$ so we want to minimize $\left[\left(\frac 1 4e^{-s}+\frac 34\right)e^{s(1/4-t)}\right]^{n/2}$ and this has a minimum at $s'=\log \frac{-4t-3}{12t-3}$. So an upper bound for the probability that the proportion of consecutive heads differs from its mean of $1/4$ by more than $t$ is bounded above by $2$ times those two expressions evaluated at their respective minima.

For $t=1/10$, that is, the proportion of two consecutive heads in a sample is more than $1/10$ away from the mean, is bounded above by

  • $1.031764$ for $n=100$
  • $9.134969(10)^{-06}$ for $n=1000$
  • $3.611308(10)^{-54}$ for $n=10000$

Apparently the bound is not very good at low $n$, but fortunately you're asked for $1000$, so here's a number. If you want the actual formula as a function of $t$, you'll need to plug in the minimum value of $s$ and $s'$ into the two expressions.