Samples to converge to mean

168 Views Asked by At

Consider the set $\{-K,-K+1,\dots,0,1,\dots,K\}$. Consider the random variable $a$ which picks integers from the set uniformly. We expect the mean of $a$ to be $0$.

How many samples we need for the mean to converge to range $[-\delta,\delta]$ with probability $p$?

3

There are 3 best solutions below

5
On

Let us denote $$ a_n=\frac1n\sum_{k=1}^nX_k, $$ where $X_k$'s are iid random variables with the uniform distribution on $\{-K,\ldots,K\}$. Using Chebyshev's inequality, $$ P(|a_n|>\delta)\le\frac{\operatorname{Var}a_n}{\delta^2} $$ or, equivalently, $$ P(|a_n|\le\delta)\ge1-\frac{\operatorname{Var}a_n}{\delta^2}. $$ Since $$ \operatorname{Var}a_n=\frac1{n^2}\cdot n\cdot \operatorname{Var}X_1=\frac1n\cdot\frac{(2K+1)^2-1}{12}, $$ we have that $$ P(|a_n|\le\delta)\ge1-\frac1{\delta^2}\cdot\frac1n\cdot\frac{(2K+1)^2-1}{12}. $$ By choosing $n$ large enough, we can make the right side of the inequality above greater or equal to $p$, which is the desired probability.

This is only a bound, but I hope that this is useful.

0
On

Your distribution has an expected value of $0$ and a variance of $\dfrac{k(k+1)}{3}$ so the average of $n$ independent samples (with replacement) has an expected value of $0$ and a variance of $\dfrac{k(k+1)}{3n}$

Using a Central Limit Theorem normal approximation, this suggests that something like $n \gt \dfrac{k(k+1) \left(\Phi^{-1}\left(\frac{p+1}{2}\right)\right)^2}{3\delta^2} $ should be a reasonable estimate except in extreme cases, where $\Phi^{-1}(x)$ is the inverse of the cumulative distribution function of a standard normal.

For example, if $k=3$, $p=0.9$ and $\delta=0.1$ then $\Phi^{-1}\left(\frac{p+1}{2}\right) \approx 1.64485$ and this suggests you should look at something like $n \gt 1082.2$. It will not be precise, but will usually be close

0
On

You may want to apply the sampling distribution of the mean:

$$X_n=\frac1n\sum_{k=1}^nX_k \mbox{ where } X_k \sim a$$

For large $n$ we can approximate $X_n$ by a normal distribution:

$$X_n \stackrel{approx}{\sim} N(\mu_a,\frac{\sigma_a^2}{n})$$

where

$$\mu_a = 0$$

and

$$\sigma_a^2 = \sum_{k=-K}^K k^2\cdot \frac{1}{2K+1}=\frac{2}{2K+1}\sum_{k=1}^K k^2 = \frac{2}{2K+1}\frac{K(K+1)(2K+1)}{6} = \frac{K(K+1)}{3}$$

Now you normalize $X_n$ which gives (approximately) a standard normal distribution $Z$:

$$\frac{X_n\sqrt{n}}{\sigma_a}\stackrel{approx}{\sim}N(0,1)$$

So:

$$P(|X_n| \leq \delta) = P(\frac{|X_n|\sqrt{n}}{\sigma_a} \leq \frac{\delta\sqrt{n}}{\sigma_a}) \approx P(Z \leq \frac{\delta\sqrt{n}}{\sigma_a}) \mbox{ where } Z \sim N(0,1)$$

With $\Phi(z) = P(Z\leq z)$, you get the following equation giving a relation between your interval $[-\delta,\delta]$, the probability $p$ and the sample size $n$:

$$2\Phi(\frac{\delta\sqrt{n}}{\sigma_a}) - 1 = p$$

I hope this helps.