Interpreting estimated percentage as a distribution

59 Views Asked by At

Imagine I am trying to determine the percentage $p$ of people in the US who voted for the democrats (or republicans, if you prefer). I can determine this by the following process:

  1. Randomly select $n$ people
  2. Ask them if they voted for the democrats. $m$ people say yes.
  3. Estimate the percentage by $\hat{p}=\frac{m}{n}$

If $n$ is large, then by the Central Limit Theorem, I know that approximately, $\hat{p} \sim \mathcal{N}\left(p, \frac{p (1-p)}{n}\right)$.

Now, I can turn this around, saying that after learning my estimate $\hat{p}$, I get a distribution for $p$ with $p \sim \mathcal{N}\left(\hat{p}, \frac{p (1-p)}{n}\right)$. While this seems like a natural thing to do, I cannot wrap my head around what it means. In particular, $p$ is not actually a distribution, but an unknown constant.

Is there an interpretation of this process that allows me to view $p$ as a distribution? What terminology should I use when speaking about the distribution of $p$? For example, could I say "my belief on $p$ follows distribution D"?

2

There are 2 best solutions below

2
On BEST ANSWER

While this seems like a natural thing to do, I cannot wrap my head around what it means. In particular, $p$ is not actually a distribution, but an unknown constant.

Your thought process is exactly correct and leads to discussing Bayesian statistics.

Note that $p$, even though you think of it as having its own distribution, is actually conditional on the data $X_1, \dots, X_n$, where $X_1, \dots, X_n$ are Bernoulli distributed with probability $p$.

Thus, by Bayes' Theorem, we obtain (and I'm going to use $p_0$ for a value in the support of $p$): $$f_{p \mid X_1, \dots, X_n}(p_0 \mid x_1, \dots, x_n)=\dfrac{f_{X_1, \dots, X_n \mid p}(x_1, \dots, x_n \mid p_0) \cdot f_{p}(p_0)}{f_{X_1, \dots, X_n}(x_1, \dots, x_n)}$$ Assuming that $X_1, \dots, X_n$, when conditioned on $p$, are independent, we may write $$\begin{align} f_{p \mid X_1, \dots, X_n}(p_0 \mid x_1, \dots, x_n)&=\dfrac{f_{X_1\mid p}(x_1 \mid p_0) f_{X_2 \mid p}(x_2 \mid p_0) \cdots f_{X_n \mid p}(p_0) \cdot f_{p}(p_0)}{f_{X_1, \dots, X_n}(x_1, \dots, x_n)} \\ &= \dfrac{p_0^{t}(1-p_0)^{n-t}f_p(p_0)}{c} \\ &\propto p_0^{t}(1-p_0)^{n-t}f_p(p_0) \end{align}$$ where $c$ is a constant independent of $p_0$, and $t$ is the number of random variables, out of $X_1, \dots, X_n$, which result in the "success" with probability $p$. I use $\propto$ to mean "proportional to;" we don't need to worry about constants with respect to $p_0$ for now.

The constant $c$ isn't really that important. However, what remains is the important problem of assigning $f_p$: one popular model is to assume the Beta distribution for $p$ (known as a "prior" for $p$), for which $$f_p(p_0) \propto p_0^{\alpha - 1}(1-p_0)^{\beta - 1}$$

so thus $$f_{p \mid X_1, \dots, X_n}(p_0 \mid x_1, \dots, x_n) \propto p_0^{t+\alpha - 1}(1-p_0)^{n-t+\beta - 1}\text{.}$$ Since we know that $p_0 \in (0, 1)$, this is proportional to a Beta distribution! Thus, the result is as follows:

Suppose $X_1, \dots, X_n$ are Bernoulli distributed with success probability $p$, and are independent given $p$. Suppose also that $p$ follows a Beta distribution with parameters $\alpha, \beta$, known as the prior of $p$. Let $t < n$ be the number of $X_1, \dots, X_n$ which result in a success. Then $$p \mid X_1, \dots, X_n \sim \text{Beta}(t+\alpha, n-t+\beta)\text{.}$$

This is known as the posterior distribution of $p$ given $X_1, \dots, X_n$, assuming a Bernoulli likelihood for $X_1, \dots, X_n$ and a Beta prior for $p$. In particular, because the Bernoulli likelihood and the Beta prior follow the same form (when ignoring constants with respect to $p$), this formulation is also known as the Bernoulli-Beta conjugate prior. I strongly recommend you read more into this interesting subject.

0
On

If you consider the central limit theorem, it essentially states that the distribution of sample means is approximately normal. The variable is the sample means, which can assume any value, so the distribution gives a representation of the probabilities of getting a sample mean equal to a particular range of values. While the observed sample mean is a constant, the sample mean is a distribution of the observed constants.

The particular value $p$ is just one of the many values that can be assumed by the sample proportion variable. $\hat{p}$ is not a constant, but a distribution of possible values. Your particular sample proportion for a particular set of data is just one observed value. This is like how the actual mean for a sample depends on the data set, but the distribution of all the possible sample means just shows how all these values for all samples are distributed.

I think, in this specific case, the confusion arises from the notation you have used. Commonly $\hat{p}$ is the variable, $p$ is the actual parent proportion. To estimate a population proportion $p$, we consider taking a random sample. The distribution of the random variable, $\hat{p}$, the sample proportion, determines the accuracy of the estimate.

For example, you are right to say that an observed value is a constant, but not that the variable itself is a constant.