Imagine I am trying to determine the percentage $p$ of people in the US who voted for the democrats (or republicans, if you prefer). I can determine this by the following process:
- Randomly select $n$ people
- Ask them if they voted for the democrats. $m$ people say yes.
- Estimate the percentage by $\hat{p}=\frac{m}{n}$
If $n$ is large, then by the Central Limit Theorem, I know that approximately, $\hat{p} \sim \mathcal{N}\left(p, \frac{p (1-p)}{n}\right)$.
Now, I can turn this around, saying that after learning my estimate $\hat{p}$, I get a distribution for $p$ with $p \sim \mathcal{N}\left(\hat{p}, \frac{p (1-p)}{n}\right)$. While this seems like a natural thing to do, I cannot wrap my head around what it means. In particular, $p$ is not actually a distribution, but an unknown constant.
Is there an interpretation of this process that allows me to view $p$ as a distribution? What terminology should I use when speaking about the distribution of $p$? For example, could I say "my belief on $p$ follows distribution D"?
Your thought process is exactly correct and leads to discussing Bayesian statistics.
Note that $p$, even though you think of it as having its own distribution, is actually conditional on the data $X_1, \dots, X_n$, where $X_1, \dots, X_n$ are Bernoulli distributed with probability $p$.
Thus, by Bayes' Theorem, we obtain (and I'm going to use $p_0$ for a value in the support of $p$): $$f_{p \mid X_1, \dots, X_n}(p_0 \mid x_1, \dots, x_n)=\dfrac{f_{X_1, \dots, X_n \mid p}(x_1, \dots, x_n \mid p_0) \cdot f_{p}(p_0)}{f_{X_1, \dots, X_n}(x_1, \dots, x_n)}$$ Assuming that $X_1, \dots, X_n$, when conditioned on $p$, are independent, we may write $$\begin{align} f_{p \mid X_1, \dots, X_n}(p_0 \mid x_1, \dots, x_n)&=\dfrac{f_{X_1\mid p}(x_1 \mid p_0) f_{X_2 \mid p}(x_2 \mid p_0) \cdots f_{X_n \mid p}(p_0) \cdot f_{p}(p_0)}{f_{X_1, \dots, X_n}(x_1, \dots, x_n)} \\ &= \dfrac{p_0^{t}(1-p_0)^{n-t}f_p(p_0)}{c} \\ &\propto p_0^{t}(1-p_0)^{n-t}f_p(p_0) \end{align}$$ where $c$ is a constant independent of $p_0$, and $t$ is the number of random variables, out of $X_1, \dots, X_n$, which result in the "success" with probability $p$. I use $\propto$ to mean "proportional to;" we don't need to worry about constants with respect to $p_0$ for now.
The constant $c$ isn't really that important. However, what remains is the important problem of assigning $f_p$: one popular model is to assume the Beta distribution for $p$ (known as a "prior" for $p$), for which $$f_p(p_0) \propto p_0^{\alpha - 1}(1-p_0)^{\beta - 1}$$
so thus $$f_{p \mid X_1, \dots, X_n}(p_0 \mid x_1, \dots, x_n) \propto p_0^{t+\alpha - 1}(1-p_0)^{n-t+\beta - 1}\text{.}$$ Since we know that $p_0 \in (0, 1)$, this is proportional to a Beta distribution! Thus, the result is as follows:
This is known as the posterior distribution of $p$ given $X_1, \dots, X_n$, assuming a Bernoulli likelihood for $X_1, \dots, X_n$ and a Beta prior for $p$. In particular, because the Bernoulli likelihood and the Beta prior follow the same form (when ignoring constants with respect to $p$), this formulation is also known as the Bernoulli-Beta conjugate prior. I strongly recommend you read more into this interesting subject.