How to understand the Posterior hyperparameters for Bernoulli in Beta conjugate prior?

Question

How to understand the Posterior hyperparameters for Bernoulli in Beta conjugate prior?

107 Views Asked by Bumbble Comm At 30 Mar 2026 - 10:38

From here: https://en.wikipedia.org/wiki/Conjugate_prior#When_the_likelihood_function_is_a_discrete_distribution

I know $\text{posterior} = \frac{\text{proir} \cdot \text{likelyhood}}{\text{evidence}}$, but how it get $\alpha +\sum _{i=1}^{n}x_{i},\,\beta +n-\sum _{i=1}^{n}x_{i} $ from that formula and what does sum of $x_{i}$ means? Given Bernoulli is :

$$ p(x,\mu) = \mu^x(1-\mu)^{1-x} x \in \{0,1\} $$

Can some one give me some intuition about the posterior hyperparameters? e.g. explain how those term could help me get something from $\text{posterior} = \frac{\text{proir} \cdot \text{likelyhood}}{\text{evidence}}$.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 14 Aug 2023 - 9:47

Your "evidence" in the denominator is simply a scaling factor so the whole expression integrates over $\mu$ to $1$, as it needs to for a probability. We are interested in the shape of the distribution for $\mu$.

The prior for $\mu$ is a Beta distribution with density proportional to $\mu^\alpha (1-\mu)^\beta$ with $0 \le \mu \le 1$

The likelihood from the observations $x_1,x_2,\ldots,x_n \in \{0,1\}^n$ is proportional to $\mu^{x_1}(1-\mu)^{1-{x_1}}\mu^{x_2}(1-\mu)^{1-{x_2}}\cdots \mu^{x_n}(1-\mu)^{1-{x_n}}$ $= \mu^{\sum x_i}(1-\mu)^{n-\sum x_i}$

So the product of the prior for $\mu$ and the likelihood is proportional to $\mu^{\alpha+\sum x_i}(1-\mu)^{\beta+n-\sum x_i}$. I.e. the posterior for $\mu$ is also a Beta distribution but with updated parameters.

**Bumbble Comm** · Accepted Answer

As the link explains and I quote:

Let $n$ denote the number of observations. In all cases below, the data is assumed to consist of $n$ points $x_{1}, \dots, x_{n}$

One important assumption left out is that these $x_{i}$ are independent observations, although this is as far as I have read the norm anyways.

To be explicit, each $x_{i}$ is an observation of a Bernoulli experiment with parameter $\mu$; $$p(x_{i} \mid \mu) = \mu^{x_{i}}(1-\mu)^{1-x_{i}}.$$

Let me denote $\mathcal{D} = \{x_{1}, \dots, x_{n}\}$. To get the hyperparameters for the posterior it is easiest to use the unnormalized Bayes theorem:

$$p(\mu\mid \mathcal{D})\propto p(\mathcal{D}|\mu)p(\mu).$$ Since the data in $\mathcal{D}$ is (implicitly) assumed to be independent: $$p(\mathcal{D}\mid \mu) = \prod_{i=1}^{n}p(x_{i}\mid \mu) = \prod_{i=1}^{n} \mu^{x_{i}}(1-\mu)^{1-x_{i}} = \mu^{\sum_{i=1}^{n}x_{i}}(1-\mu)^{n - \sum_{i=1}^{n}x_{i}}.$$ Which therefore implies, since $\mu\sim\text{Beta}(\alpha,\beta)$ and hence $p(\mu) \propto \mu^{\alpha-1}(1-\mu)^{\beta-1}$, that the posterior is:

$$p(\mu\mid \mathcal{D})\propto \mu^{\sum_{i=1}^{n}x_{i} + \alpha - 1}(1-\mu)^{n - \sum_{i=1}^{n}x_{i} + \beta - 1}.$$

To find the normalizing constant, you could either integrate, but more simply note that by comparing coefficients that $p(\mu\mid \mathcal{D})$ is proportional to the density of a $\text{Beta}(\alpha + \sum_{i=1}^{n} x_{i}, n + \beta - \sum_{i=1}^{n} x_{i})$ and hence the constant is the one from this Beta distribution.

How to understand the Posterior hyperparameters for Bernoulli in Beta conjugate prior?

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions