In Bayesian Statistic how do you usually find out what is the distribution of the unknown?

211 Views Asked by At

To estimate the posterior we have

$$p(\theta|x) = \frac{p(\theta)*p(x|\theta)}{\sum p(\theta ')*p(x|\theta ')}$$

$x$ is usually the experimentally sampled data, and $\theta$ is the model, but both $p(x|\theta)$ and $p(\theta)$ is unknown, how do you usually measure those two quantities?

2

There are 2 best solutions below

3
On

These quantities are known as part of the model. $p(\theta)$ is the prior, which you chose (a classic example is the Beta distribution), and $p(x|\theta)$ is the density function of $X|\theta$, for example the model is such that $X|\theta\sim\mathcal{N}(\theta,1)$.

1
On

The prior distribution is based on previous experience, data, intuition, whatever. If none of these are available, then a flat or non-informative prior is given.

Presumably, you know the distributional form of $p(x|\theta)$ based on your experiment.

If possible, it is desirable to pick a prior distribution $p(\theta)$ that is "conjugate" to the likelihood (data) distribution $p(x|\theta)$. That is, "mathematically compatible". In that case one can deduce the posterior distribution $p(\theta|x)$ without having to compute the denominator.

Simple example: Trying to predict the outcome of an election, a campaign strategist chooses the prior distribution $Beta(\alpha_0 = 330, \beta_0 = 270)$ on the probability $\theta$ that the candidate is going to win. This is based on a 'hunch' that $\theta$ is 'near' 0.55 and is most likely to lie somewhere in $(0.51, 0.59)$. [You can check that this beta distribution has about the right properties.]

A subsequent reliable poll shows $x = 620$ in $n = 1000$ favoring the candidate. This gives a binomial likelihood. Ignoring the constants that make the beta and binomial distributions sum to 1, we have $$p(\theta|x) \propto p(\theta)p(x|\theta) \propto \theta^{\alpha_0 -1} (1-\theta)^{\beta_0 -1} \times \theta^x (1-\theta)^{n-x}\\ = \theta^{a_0 + x - 1}(1-\theta)^{\beta_0 - n - x -1} = \theta^{\alpha_n -1}(1-\theta)^{\beta_n - 1},$$ where we recognize the posterior is proportional to ($\propto)$ the density of the distribution $Beta(\alpha_n, \beta_n),$ with $\alpha_n = 950$ and $\beta_n = 650.$

Then one can cut 2.5% of the area from each tail of the posterior distribution to find the posterior probability interval $(0.57,0.62).$ This is different from the frequentist 95% CI one would have obtained from the data alone. One can say that the Bayesian approach has accomplished an appropriate melding of prior opinion and data to produce a probability interval estimate for $\theta.$

In practice, one would typically try several different prior distributions in order to assess the influence of each on the result. (Very roughly speaking, our strategist's prior distribution carries about the same weight as a poll of 600 people with 330 in favor of our candidate.)

If the consultant is from Mars and has no knowledge of human elections, then the non-informative prior distribution might have been something like $Beta(1,1)$ or $Beta(.5,.5)$ and the endpoints of the posterior probability interval would closely approximate the endpoints of the frequentist CI based on the data alone. (The philosophical interpretations of the Bayesian and frequenist intervals are rather different, but that is for another discussion.)

Another Answer suggests the prior might be beta and the likelihood normal. Computationally speaking, that would be a bit messier because beta and normal are not conjugate distributions and we would need to compute the denominator of Bayes' theorem. Beta is conjugate to binomial; normal is conjugate to normal (but not quite obviously so). It is worth noting that graphs of densities of $Norm(\mu=.55,\sigma=.02)$ and $Beta(330, 270)$ are difficult to distinguish [except, or course, the normal is not constrained to $(0,1).$]

Acknowledgment: This example is similar to one in Chapter 8 of Suess and Trumbo (2010), Springer.