Baysian analysis of the Poisson distribution

Question

Baysian analysis of the Poisson distribution

81 Views Asked by Bumbble Comm At 27 Mar 2026 - 7:49

$ D = (x_i )_{i=1:n}$ is the training data, where $x_i$ follows a Poisson distribution of parameter $\lambda$. The likelihood is $ p(D | \lambda) = \prod_{i=1}^n exp(-\lambda) \lambda^{x_i}/{x_i!} $

We assume that the prior distribution of $\lambda$ is $p(\lambda) = Ga(\lambda|a,b)$ ( gamma distribution )

The goal is to show that the posterior follows a gamma distribution. The demonstration that I found in a textbook is the following : $$ \begin{align} \mathsf p(\lambda|D) &\propto p (\lambda) p(D | \lambda) \\ &\propto exp(-\lambda(b+n)) * \lambda^{a-1 + \sum x_i} \end{align} $$

and to conclude that $ p(\lambda|D) = Ga(a + \sum x_i, b+n )$

I don't understand how you can so easily ignore the constant term of the gamma distribution ( by that I mean $ \beta^\alpha / \Gamma (\alpha) $ ). Why is it only necessary to show that the distribution is proportional to these two terms to conclude that it's a gamma distribution ?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Bayes' theorem/rule basically says

$$f(\lambda \mid \boldsymbol x) = \frac{f(\boldsymbol x \mid \lambda)f(\lambda)}{f(\boldsymbol x)}, \tag{1}$$ where $\boldsymbol x = (x_1, \ldots, x_n)$ is the observed sample, and $\lambda$ is the parameter. The left-hand side describes the posterior distribution of the parameter, given the observed sample. The right-hand side is the product of the joint conditional distribution of the sample given the parameter, times the prior distribution on the parameter, divided by the marginal/unconditional joint distribution of the sample.

The key insight here is that the left-hand side is regarded as a function of $\lambda$, but the marginal distribution in the denominator on the right-hand side does not depend on $\lambda$. Everything we know about the posterior likelihood of $\lambda$ comes from the numerator on the right-hand side.

For example: suppose we have the classic binomial model with beta conjugate prior: $$X_i \mid \theta \sim \operatorname{Bernoulli}(\theta), \\ \theta \sim \operatorname{Beta}(a,b).$$ Then for a sample $\boldsymbol x = (x_1, \ldots, x_n)$, a sufficient statistic for $\theta$ is the sample total $T = \sum_i X_i$, which given $\theta$, is binomial. So the numerator of Bayes' rule $(1)$ is $$f_T(t \mid \theta) f(\theta) = \binom{n}{t} \theta^t (1-\theta)^{n-t} \cdot \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \theta^{a-1} (1-\theta)^{b-1}. \tag{2}$$ All I have done here is chosen to work with the more convenient sufficient statistic $T$ rather than the joint Bernoulli distribution on $\boldsymbol x$, and just copied the relevant binomial and beta distributions.

At this point, I remind you: the left-hand side of $(1)$ is a function of the parameter. Everything else that is not a function of $\theta$ is a constant with respect to $\theta$. So in Equation $(2)$, every multiplicative factor that does not depend on $\theta$ is simply a constant of proportionality with respect to the posterior distribution, and does not contain any information about how the likelihood varies over $\theta$. So if I omit all of these and write $$f_T(t \mid \theta) f(\theta) = \theta^{t+a-1} (1-\theta)^{n-t+b-1},$$ this is also a posterior likelihood for $\theta$. The only thing left to recognize is that if there were a constant multiplicative factor of $$\frac{\Gamma(n+a+b)}{\Gamma(t+a)\Gamma(n-t+b)}$$ in front, this would make our likelihood a proper beta posterior density with posterior hyperparameters $t+a$ and $n-t+b$.

If you are not convinced, you can explicitly calculate the integral for the marginal distribution of $T$: $$f_T(t) = \int_{\theta = 0}^1 f_T(t \mid \theta) f(\theta) \, d\theta = \binom{n}{t} \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \int_{\theta = 0}^1 \theta^{t+a-1} (1-\theta)^{n-t+b-1} \, d\theta.$$ Note that because the variable of integration is $\theta$, those multiplicative factors I removed earlier simply factor out of the integral and cancel with the numerator anyway. This is why we dropped them. The remaining integral evaluates to $$\frac{\Gamma(t+a)\Gamma(n-t+b)}{\Gamma(n+a+b)},$$ precisely the reciprocal of the multiplicative factor that we needed to make the right-hand side a proper density. But if we recognize the functional form--that is to say, the kernel of the beta distribution--we don't need to compute the integral.

In your case, the inference of interest is on $\lambda$, not the hyperparameters. So if we look at the Poisson/gamma conjugate pair for a single observation (for simplicity), we have $$e^{-\lambda} \frac{\lambda^x}{x!} \cdot \frac{b^a \lambda^{a-1} e^{-b \lambda}}{\Gamma(a)}.$$ And as in the binomial/beta model above, the only factors that are functions of $\lambda$ are $$e^{-\lambda} \lambda^x \lambda^{a-1} e^{-b \lambda} = \lambda^{x+a-1} e^{-(b+1)\lambda}. \tag{3}$$ And when we put these together, it becomes obvious that this is the kernel of a gamma distribution with posterior hyperparameters $x+a$ and $b+1$. The same thing applies for multiple observations: the sufficient statistic is the sum of Poisson observations, so instead of $(x_1, \ldots, x_n)$, we can just take the total, which is Poisson with rate $n\lambda$; consequently $x$ is replaced in $(3)$ with the sample total $\sum_i x_i$, and $1$ is replaced with $n$.

Baysian analysis of the Poisson distribution

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in STATISTICAL-INFERENCE

Related Questions in BAYESIAN

Related Questions in GAMMA-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions