calculate a probability using the central limit theorem

596 Views Asked by At

$X$ is a variable of a Bernoulli distribution $ X \sim b(p)$ where $p\in(0,1)$. We also have the sequence of independent and identically distributed variables $Y_n$ with uniform distribution. $ Y_{n|X=x} \sim U([-p^x(1-p)^{1-x},p^x(1-p)^{1-x}]) $

Calculate $P(|\overline{Y_n}| \le \frac{1}{\sqrt{n}})$ where $\overline{Y_n} =\frac{1}{n} \sum_{k=1}^{n} Y_k $

Using the central limit theorem the variable $\overline{Y_n}$ can be approximated with a normal distribution $ W \sim \mathcal{N}(\mu,\sigma^2) $ where:

$ \mu = E(\overline{Y_n})= E(\frac{1}{n} \sum_{k=1}^{n} Y_k) = 0$

Because:

$E(Y_k) = EE(Y_{k|X=x}) = E( \frac{-p^x(1-p)^{1-x} + p^x(1-p)^{1-x}}{2}) =0$

$\sigma^2= var(\overline{Y_n})= var(\frac{1}{n} \sum_{k=1}^{n} Y_k) =\frac{1}{n^2} \sum_{k=1}^{n} var(Y_k) = $

$\frac{1}{n^2}n \frac{(p^x(1-p)^{1-x} + p^x(1-p)^{1-x})^2}{12} =\frac{1}{n} \frac{(p^x(1-p)^{1-x} )^2}{3}$

The final result is:

$P(|\overline{Y_n}| \le \frac{1}{\sqrt{n}}) = 2\Phi( \frac{\frac{1}{\sqrt{n}} - \mu}{\sigma})-1 = 2\Phi( \frac{ \frac{1}{\sqrt{n}} }{\frac{p^x(1-p)^{1-x}} {\sqrt{3n}}})-1 = 2\Phi( \frac{ \sqrt{3} }{p^x(1-p)^{1-x}})-1 $

I would like to know if it is solved correctly

1

There are 1 best solutions below

0
On

It seems to me that you assume $\bar Y|X = x$ and $\bar Y$ are the same thing. And it is not clear to me from your notation in the statement of the problem whether that was intended.

If you can determine $x$ once at the beginning of the process, and go on from there, what you have is OK. The value of $x$ is either 0 or 1 throughout, and accordingly $p^x(1-p)^{1-x}$ is either $p$ or $1-p$ throughout.

Otherwise, unless ($p \ne 1/2$) the variance you call $\sigma^2$ is a random variable.

I think you can figure out how to use what you have done to solve the problem. It's just that you now have two different solutions masquerading as one.

I don't know whether you are familiar with simulations, but I have simulated it your way first and the alternative way second. The distribution of $\bar Y$ is ambiguous your way. I used parameters $n = 20$ and $p = 0.7$. Results should be accurate to two places.

 # Your way--on THIS run: it happened that x = 1,  a = .7
 m = 10^5;  n = 20;  p = .7;  x.bar = numeric(m)
 x = rbinom(1,1,p);  a = p^x * (1-p)^(1-x)  # used throughout
 for (i in 1:m) {
      x.bar[i] = mean(runif(n, -a, a))
    }
 mean(abs(x.bar) <= n^(-.5))
 ## 0.98763              # 1.00 if x = 0
 2*pnorm(sqrt(3)/a) - 1  # 1.00 if x = 0
 ## 0.9866524
 # diagnostics
 mean(x.bar);  sd(x.bar);  mean(x.bar < .1) 
 ## 0.0003531966
 ## 0.09022035           # .039 if x = 0
 ## 0.86354              # .996 if x = 0

 # Alternative
 m = 10^5;  n = 20;  p = .7;  x.bar = a = numeric(m)
 for (i in 1:m) {
      x = rbinom(1,1,p);  aa = a[i] = p^x * (1-p)^(1-x)
      x.bar[i] = mean(runif(n, -aa, aa))
     }
 mean(abs(x.bar) <= n^(-.5))
 ## 0.99057       # your approximation not feasible as is
 # diagnostics
 mean(x.bar);  sd(x.bar);  mean(x.bar < .1) 
 ## 0.0003049393
 ## 0.07864183
 ## 0.90343
 mean(a); sd(a)   # support of UNIF is random
 ## 0.580396
 ## 0.1831306