Why is this standardization of a normal distribution only using the estimated p for the variance?

79 Views Asked by At

I'm doing some old statistics exams for practice, when I found this task, crudely translated:

We have $$X \sim Bin(n,p)$$

"The random selection of size $n$ is large enough to approximate the distribution of the random variable $X$ with a normal distribution. We can therefore assume that: $$\frac{(X-np)}{\sqrt{n\hat{p}(1-\hat{p})}}$$

is approximately standard normal distributed."

That is: $$\frac{(X-np)}{\sqrt{n\hat{p}(1-\hat{p})}} \sim N(0,1) $$ Furthermore, they ask me to calculate a confidence interval, but that is not my issue.

First, a clarification. $\hat{p}$ is the estimated value of p using the given $X$ and $n$: $\hat{p}=\frac{X}{n}$

I have seen a similar notation used as in the formula above several times when binomial distributions are transformed to standard normal distributions, however, not using $\hat{p}$. I cannot understand why $\hat{p}$ only replace the p's in the denominator. That is, why only the variance calculation uses $\hat{p}$.

Thank you for the help

2

There are 2 best solutions below

2
On BEST ANSWER

This follows from Slutsky's Theorem. In this case, $\hat p $ converges in probability to the constant $p$ and $\hat p(1-\hat p)/(p(1-p))$ converges in probability to $1$.

0
On

Using $\frac{X-np}{\sqrt{n\hat{p}(1-\hat{p})}}$ as a pivotal quantity allows obtaining a CI for the unknown $p$ by rearranging an expression like $P[a<\frac{X-np}{\sqrt{n\hat{p}(1-\hat{p})}}<b]$ into one like this:$$P[g(X,\hat{p})<p<h(X,\hat{p})],$$ where the left and right-hand limits depend only on $X$ and known quantities. If the unknown $p$ remained in the denominator, this wouldn't be possible.