Why conjugate distributions can be identified up to a constant?

131 Views Asked by At

A family $\mathcal{F}$ of densities distributions and a family $\mathcal{P}$ of prior distributions are said conjugates if $\forall f \in \mathcal{F}, \pi(\theta) \in \mathcal{P},$ $\pi(\theta\mid x) \in \mathcal{P}$

In practice, to determine if two families are conjugates books just check if $f \pi \propto \Pi$ with $\Pi \in \mathcal{P}$. This happens for instance for the gamma and poisson distributions.

Why is it enough to check it up to a constant?

Edit

Perhaps, I should have given the whole explicit example beofre.

If $X_i \sim \mathcal{P}(\theta)$ one can show that

$$h(x,\theta) = f(x|\theta) \pi(\theta) = \frac{e^{-\theta (n+1/ \beta)}\theta^{(n\overline{x}+\alpha -1)}I_{(0,\infty)}(\theta)}{\Gamma(\alpha)\beta^\alpha \prod x_i!}$$ which is like a $Gamma(n\overline{x}+\alpha,(n+1/\beta)^{-1})$, up to a constant because $\prod x_i!$ does not appear in the expression of the gamma.

Then I think that one can say that $m(x) = h(x,\theta)/\pi(\theta|x)$ so that the marginal $m(x)$ compensates the assumption that $\pi(\theta|x)$ is a gamma distribution. However, is this not limiting on the form of the gamma?

Solution

This is more silly than I thought. The situation (for instance for the example above) is that I get $f(x|\theta)\pi(\theta) = B Gamma$. Then I ask why the marginal normalizes this and the answer is straight forward, because $\int f(x|\theta)\pi(\theta) = \int B Gamma = B \int Gamma = B$

2

There are 2 best solutions below

0
On BEST ANSWER

Remember how posterior distributions are found: Multiply the prior by the likelihood and then normalized. If you show that the posterior after normalization will be in the proposed family, then there's no need to do the actual normalization if that's all you're trying to show.

0
On

Every density or probability function can be determined up to a constant, since that constant is in fact uniquely determined by the conditions $$\int_{-\infty}^{\infty} f(x) \, dx=1,$$ for density functions and $$\sum_{x\in R_X} p(x) =1,$$ for probability functions ($R_X$ being the range of the discrete r.v. $X$).

For instance, if I know that $X$ is a continuous r.v. with support in $[0,1]$ and its density is proportional to $x^2$ over the support, then we have $$f_X(x)=kx^2 I_{[0,1]}(x),$$ but as its integral over the reals has to equal one, that is $$\int_{-\infty}^{\infty} f_X(x) \, dx=k\int_0^1 x^2 \, dx=\frac k3=1$$ the only option is $k=3$.

So in particular when you multiply two densities or probabilities to obtain another one of them, as in $$f_{\Theta|\vec{X}=\vec{x}}(\theta)\ltimes f_{\vec{X}|\Theta=\theta}(\vec{x})\cdot \pi(\theta)$$ (whether you have conjugates or not) you can use any multiple of the prior and the likelihood (thinking as constant everything but $\theta$), and you will obtain a multiple of the posterior density (this happens even if you use the real $f_{\vec{X}}$ and the real $\pi$), so the posterior (the density or probability) will be determined up to a constant, as happens with any other density or probability function.

Two more comments:

  • Sometimes the formula shows clearly the distribution even if a constant is undetermined. For instance, if $f_Z(z)\ltimes z^3 e^{-2z}$ for $z>0$ and $0$ elsewhere, it is clear that $Z\sim \Gamma(\alpha=4,\lambda=2)$ (at least if you ar familiar enough with the Gamma distribution).
  • Some important algorithms for simulations used in estimation and other procedures need as input only a multiple of certain involved distributions, and not the exact formula; this happens for instance if a density $f$ only appears in the algorithm in an expression as $\frac {f(x_1)}{f(x_0)}$, in which case any additional constant in $f(x_1)$ and $f(x_0)$ can be cancelled out.