I got one probability expectation problem and solved with two methods (both based on Bernoulli). And both goes to different result. I can't figure out what's the difference.
Problem)
There're 5 families with 6 kids each. How much families out of 5 are expected to have 4 or more daughters? (probability of having son or daughter is both 1/2)
1st method)
this is the one written in the book.
P(D>=4) = Probability that each families have 4 or more daughters (including 4,5, 6 daughters)
$$ P(D \geqq 4 ) = P(D = 4) + P(D=5) + P(D=6) $$ $$ = 6C4 \times (\frac{1}{2})^6 + 6C5 \times (\frac{1}{2})^6 + 6C6 \times (\frac{1}{2})^6 = \frac{11}{32} $$ (C : combination)
The amount of families that have 4 or more daughters (we will call this 'X') now follows Bernoulli method of $ B(5,\frac{11}{32}) $. $$ E(X) = 5\times\frac{11}{32} =\frac{55}{32} $$
2nd method)
this is what I thought.
D = the amount of daughters in one family.
D follows $B(6, \frac{1}{2})$. Becauses it's like trying thing with probability of 50% repeatedly for 6 times.
$E(D) = 6\times\frac{1}{2} = 3 $
$V(D) = 6\times\frac{1}{2}\times\frac{1}{2} = \frac{3}{2} $
$\sigma(D) = \sqrt\frac{3}{2} $
On the Normalized Distribution graph (N, the horizontal axis called 'z') made of Bernoulli trial, let me get portion of $(D\geqq 4) $ part.
$$ P(D\geqq4) = N(z \geqq \frac{4 - 3}{\sqrt\frac{3}{2}}) = N(z \geqq 0.816409) = 0.209 $$
So, if you make families with 6 kids, it has 0.209 probability of having 4 or more daughters. So if you makes 5 families, the expected amount of families having 4 or more daughters is $$ 5 \times 0.209 = 1.045 $$
However, it seems so different from $\frac{55}{32}$ of the exp value from method 1. Why are these difference happening? And which method is more proper??
Thank you genius!
First off, the normal approximation is just that: an approximation. You won't get the exact same result.
That being said, you should get a much better result. And the mistake you're making in your approach 2 is in translating $D\geq 4$ to a normal variable. If $X$ is normal distributed with mean $3$ and standard deviation $\sqrt{3/2}$, then the range of $X$ that corresponds to $D\geq 4$ is not $X\geq 4$, it's $X\geq 3.5$ (the case $D = 4$ corresponds to $3.5\leq X\leq 4.5$).
So you actually want $$ N\left(z\geq \frac{3.5-3}{\sqrt{3/2}}\right) \approx 0.341546 $$ which gives a final answer that is a lot closer to $\frac{55}{32}$.