Bias of coin toss samples

48 Views Asked by At

An AP prep book uses an example to explain sampling distribution of a statistic, it states: "Suppose a student is interested in estimating $p$, the probability of getting heads when a penny is tossed. Suppose she tosses a penny 50 times and get 24 heads. This sample of 50 toss gives her a estimated probability $(\hat{p})$ of $24/50=0.48$. So the student decides to see what happens if she repeats the experiment. The result of 100 such experiements, each of 50 tosses:"

Estimated P(Heads) Number of samples giving this estimate
0.32 1
0.34 2
0.36 3
0.38 3
0.40 5
0.42 6
0.44 6
0.46 6
0.48 11
0.50 14
0.53 10
0.54 16
0.56 5
0.58 5
0.60 2
0.62 3
0.64 2
0.66 0

and then, the author went on to state that: $$Bias(\hat{p})=E(\hat{p}) -p=0.493-0.48=0.013$$

how can 0.48 be used as population parameter($p$) in this example? My understanding is that population parameter($p$) in this example is unknown (although theorectically is 0.5) and bias can not be calculated.

1

There are 1 best solutions below

0
On BEST ANSWER

You are correct. The bias of an estimator cannot be calculated unless we know the true value of the parameter, in this case, the true probability of obtaining heads on any given coin flip.

To be clear, there is the "per trial" random variable $$X_i \sim \operatorname{Bernoulli}(p)$$ that describes the number of heads obtained on the $i^{\rm th}$ coin flip. Then there is the "per sample" or "per experiment" random variable $$B = \sum_{i=1}^{50} X_i \sim \operatorname{Binomial}(n = 50, p)$$ that describes the number of heads obtained over the course of $50$ coin flips comprising a single sample or experimental run. That is to say, the sampling distribution of the $X_i$ is binomial, and we call the distribution of $B$ the sampling distribution of the sample total.

However, the estimator $\hat p = B/n$ is not binomial because it takes on values $\{0, 1/n, 2/n, \ldots, (n-1)/n, 1\}$. It's clearly related to a binomial distribution in a simple and straightforward way. We call the distribution of $\hat p$ the sampling distribution of the sample proportion. Some people might say "distribution of the sample proportion" which I think is acceptable, but "sampling distribution" in itself is imprecise because as we can see from the examples above, there are any number of sampling distributions that can be constructed from the sample, which is the set $(X_1, X_2, \ldots )$. For instance, what is the sampling distribution of the sample variance?

Back to the question at hand, the estimator $\hat p$ was calculated $100$ times. Whether or not that first experiment with $\hat p = 0.48$ is counted among those $100$ tabulated experiments, is not made clear. However, it is obvious that $p$ is never explicitly stated. We could say that under the assumption that $p = 0.48$ from the first experiment, we can compute the sample bias of the estimator for $m = 100$ replications ("replication" in this context meaning repetitions of the experiment). But this doesn't tell us anything about the true bias because the value $0.013$ is itself a statistic, subject to randomness arising from the $100$ replications.