How can I interpret the image for prior probability?

181 Views Asked by At

The image included below is about Bayesian statistics. While looking at the lecture, the lecturer expressed the probability distribution of prior probability as a uniform distribution. Somehow, I feel like I have difficulty in interpreting the X-axis and Y-axis. Can somebody explain what the image tells?

enter image description here

2

There are 2 best solutions below

2
On BEST ANSWER

The figure drawn is a pdf, probability density function, of random variable $\theta$ with uniform distribution on $[0,1]$. It is given by $$f(\theta)=\begin{cases}1&\text{ if }\theta\in[0,1]\\0&\text{ otherwise}\end{cases}$$

Loosely speaking, uniform distribution on interval $[a,b]$, $\mathcal{U}[a,b]$, is a distribution of random variable where each value from $[a,b]$ is equally likely.

1
On

Bayesian framework. In doing Bayesian inference on the success probability $\theta \in (0,1)$ of $n$ Bernoulli trials, one has the likelihood function $p(x | \theta) \propto \theta^x (1 - \theta)^{n-x},$ where $x$ is the number of observed successes in $n$ trials. For observed $x$, the likelihood function is considered as a function of $\theta.$ (The proportionality symbol $\propto$ is used instead of $=$ because it is often unnecessary to consider the constant ${n \choose k}.$)

Prior distribution. A Bayesian inferential framework requires a prior distribution. For inference about Binomial $\theta$ it is common to select a member of the Beta family of distributions because they have support $(0, 1).$

In practice, sometimes one has no useful prior knowledge about $\theta.$ In that case, one chooses a 'flat' or 'non-informative' prior distribution. A common choice is $\mathsf{Beta}(\alpha_0=1,\beta_0=1) \equiv \mathsf{Unif}(0,1).$ This must be the choice made in the lecture you mention.

Posterior distribution. Then, according to the the continuous version of Bayes' Rule, often written as $$\mathrm{POSTERIOR} \propto \mathrm{PRIOR} \times \mathrm{LIKELIHOOD},$$

one has, as the posterior distribution of $\theta,$ $$p(\theta | x) \propto p(\theta)\times p(x|\theta) \propto \theta^{\alpha_0 - 1}(1-\theta)^{\beta_0 - 1} \times \theta^x(1-\theta)^{n-x}\\ \propto \theta^{\alpha_0+x+1}(1-\theta)^{\beta_0 + n - x + 1} = \theta^{\alpha_n - 1}(1 - \theta)^{\beta_n - 1}.$$ In the final expression in the display one sees the kernel of $\mathsf{Beta}(\alpha_n,\beta_n).$

Illustration. For example, if a trustworthy new poll shows $x = 735$ in favor of a Candidate out of $n = 1000$ subjects interviewed, and if we use the flat prior distribution $\mathsf{Beta}(1,1),$ then the posterior distribution is $\mathsf{Beta}(\alpha_n=736, \beta_n = 266).$ Cutting 2.5% of the probability from each tail of the posterior distribution, one would have a 95% Bayesian probability interval estimate $(0.707, 0.761)$ for the population proportion in favor of the Candidate (computed in R statistical software below).

qbeta(c(.025,.975),736,266)
## 0.7067679 0.7614072

Note: By comparison, with these polling data a traditional frequentist 95% confidence interval of the form $\hat \theta \pm 1.96\sqrt{\hat \theta(1 - \hat\theta)/n},$ where $\hat \theta = x/n = 0.735,$ would be $(0.708,0.763).$

Informative prior. By contrast, previous polls and prior experience might lead someone to choose the prior $\mathsf{Beta}(800,200).$ That would be roughly equivalent to believing (in advance of seeing the new poll) that the proportion in favor is very likely to be in the interval $0.80 \pm 0.03.$ Then, with the same new polling data as above, a 95% posterior probability interval would be $(0.748, 0.785).$ This posterior distribution melds prior opinion and new polling data to give a probability interval that is somewhat higher (centered near 0.77 instead of 0.74) and narrower (width about 0.037 instead of 0.054).