Questions about Bayesian inference

Question

Questions about Bayesian inference

685 Views Asked by Bumbble Comm At 02 Apr 2026 - 1:19

From Wikipedia

The prior distribution is the distribution of the parameter(s) before any data is observed, i.e. $p(\theta \mid \alpha )$. ...

The sampling distribution is the distribution of the observed data conditional on its parameters, i.e. $p(\mathbf {X}\mid \theta )$ . This is also termed the likelihood,...

The marginal likelihood (sometimes also termed the evidence) is the distribution of the observed data marginalized over the parameter(s), i.e. $$p(\mathbf {X}\mid \alpha )=\int_{\theta }p(\mathbf {X}\mid \theta )p(\theta \mid \alpha )\operatorname {d}\!\theta .$$

The posterior distribution is the distribution of the parameter(s) after taking into account the observed data. This is determined by Bayes' rule, which forms the heart of Bayesian inference: $$ p(\theta \mid \mathbf {X},\alpha )={\frac {p(\mathbf {X}\mid\theta )p(\theta \mid \alpha )}{p(\mathbf {X} \mid \alpha )}}\propto p(\mathbf {X} \mid \theta )p(\theta \mid \alpha ) $$

In the calculation of the marginal likelihod and posterior distribution, I wonder what is the reason that $p(\mathbf {X }\mid \theta )$ is not $p(\mathbf {X} \mid \theta, \alpha )$ instead?
The posterior predictive distribution is the distribution of a new data point, marginalized over the posterior: $$ p(\tilde {x} \mid \mathbf {X},\alpha )=\int_{\theta}p(\tilde {x} \mid \theta )p(\theta \mid \mathbf {X},\alpha )\operatorname {d}\!\theta $$

Why is $p(\tilde{x} \mid \theta )$ not $p(\tilde {x} \mid \theta, X, \alpha )$ instead?

Thanks!

Original Q&A

There are 3 best solutions below

**user76844** · Answer 1 · 2014-02-10 15:55:25

The $\alpha$ are not random variables, but parameters of the assumed prior. Hence, they aren't events, and do not contribute to the conditional probability. This is why the likelihood of $\mathbf{X}$ does not include $\alpha$.

The same holds for $\mathbf{X}$ -- those are given values in the context of $\theta$. Therefore, when forming the predictive distiribution, $p(\theta |{\mathbf {X}},\alpha )$ already incorporates the information from the data on the probable values of $\theta$, hence the data are treated like parameters in a predictive setting.

**Bumbble Comm** · Answer 2 · 2014-02-10 16:13:53

For your first question

In the calculation of the marginal likelihod and posterior distribution, I wonder what is the reason that $p({\mathbf {X}}|\theta )$ is not $p({\mathbf {X}}|\theta, \alpha )$ instead?

$\alpha$ is a parameter of the probability density for $\theta$ i.e. it's a parameter of the prior. The likelihood $p({\mathbf{X}}|\theta )$ takes $\theta$ as a parameter not $\alpha$.

A simple example should help. Consider a beta prior with parameters $(a,b)$ for a binomial probability $\rho$. In this case for a single observation, $p({\mathbf{X}}|\theta )$ is of the form $\binom{\cdot}{\cdot}\rho^\cdot(1-\rho)^\cdot$ and the prior is proportional to $(1-\rho)^a\rho^b$. Here $\alpha = (a,b)$, and $\theta = \rho$.

On the second question

Why is $p({\tilde {x}}|\theta )$ not $p({\tilde {x}}|\theta, X, \alpha )$ instead?

The details are in Eupraxis1981's answer. I would simply say that the data and the hyper parameters don't appear in $p({\tilde {x}}|\theta )$, so conditioning on them is redundant. A similar example could be constructed in this case also.

**Bumbble Comm** · Answer 3 · 2014-02-10 16:49:39

Bumbble Comm On 10 Feb 2014 - 4:49

It is assumed that $X$ is independant of $\alpha\,$ given $\theta$, and also that your new point is independant of $X$ given $\theta$.

Questions about Bayesian inference

There are 3 best solutions below

Related Questions in PROBABILITY

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions