What does the phrase conjugate family of distribution mean?

340 Views Asked by At

I'm doing some self study where I encountered the phrase "conjugate family of distribution".

I tried to good it and look thought online posts and Wikipedia, but I'm still confused.

  1. What is a family of distribution? What does family mean?

  2. What's a conjugate family? Why call it conjugate?

1

There are 1 best solutions below

0
On BEST ANSWER

Family: A family of distributions is a collection of distributions with a similar formula for the PDF, in which different choices of constant parameter values are used to specify various members of the family.

Three important examples of continuous families are $\mathsf{Norm}(\text{mean}=\mu,\, \text{SD}=\sigma),\,$ $\mathsf{Gamma}(\text{shape}=\alpha,\, \text{rate}=\lambda),$ and $\mathsf{Beta}(\text{shape}=\alpha, \text{shape}=\beta).$ Two important examples of discrete families are $\mathsf{Binom}(n,p)$ and $\mathsf{Poisson}(\lambda).$

Conjugacy: This terminology is mainly used in Bayesian statistics to mean 'mathematically compatible' in such a way that certain relationships are simple to show. For example, a beta prior distribution is said to be 'conjugate' to binomial likelihood, because the posterior distribution (found by multiplying) is easily seen to be a beta distribution. (Similarly, we say that a gamma prior is conjugate to a Poisson likelihood function.)

Example Consider the prior distribution $\mathsf{Beta}(2,3)$ and a binomial likelihood function based on observing $x$ successes in $n$ trials. The 'kernel' of the beta posterior has the form $$\theta^{x + 2-1}(1 - \theta)^{n - x + 3-1} \propto \theta^{2-1}(1 - \theta)^{3-1} \times \theta^x(1-\theta)^{n-x}.$$ Here the success probability is modeled as the random variable $\theta$ and the symbol $\propto$ is read "proportional to." The kernel of a density or likelihood function omits the norming constant multiple that makes a density integrate to $1.$

In this example the mathematical compatibility of the beta and binomial distributions allow us to recognize that the kernel of the posterior is that of the distribution $\mathsf{Beta}(x+2, n-x+3).$ This 'conjugacy' makes it possible to identify the posteriar distribution without having to integrate the denominator in the general form of Bayes' Theorem.

In particular, if the prior distribution is $\theta \sim \mathsf{Beta}(2,3)$ and we observe $x=10$ Successes in $n=30$ trials, the posterior distribution of $\theta$ is $\mathsf{Beta}(12, 23)$ and a 95% posterior interval estimate for $\theta$ is $(0.1975, 0.5053)$, as computed using R.

qbeta(c(.025,.975), 12, 23)
## 0.1974586 0.5052653