Bayesian Parameter Estimation - Parameters and Data Jointly Continuous?

257 Views Asked by At

This is a follow up to my previous question regarding viewing parameters as random variables in a Bayesian framework.

If we apply Bayes' theorem to model parameters $\mathbf{\Theta} \in \mathbb{R}^n$ and data $\mathbf{Y} \in \mathbb{R}^m$ we get $$ g(\boldsymbol\theta|\mathbf{y}) = \frac{h(\mathbf{y}|\boldsymbol\theta) g(\boldsymbol\theta)}{h(\mathbf{y})}, $$ where $g$ is the marginal pdf of $\mathbf{\Theta}$, $h$ is the marginal pdf of $\mathbf{Y}$ and $g(\cdot|\mathbf{y})$, $h(\cdot|\boldsymbol\theta)$ are the respective conditional pdfs. I call $g$ and $h$ the marginal pdfs because I am assuming we are viewing the random vectors $\mathbf{\Theta}$ and $\mathbf{Y}$ as jointly continuous, and hence have a joint pdf $f(\boldsymbol\theta, \mathbf{y})$.

Is this necessary to assume? I.e., when using Bayes' rule above, it seems like I only need to assume that the collection $\{\Theta_i\}_{i=1}^n$ is jointly continuous and that the collection $\{Y_j\}_{j=1}^m$ is jointly continuous, but not necessarily both collections together. In this case $g$ would just be considered the joint pdf of $\{\Theta_i\}_{i=1}^n$ and $h$ the joint pdf of $\{Y_j\}_{j=1}^m$, rather than the marginals of the individual random vectors. Is this correct?

1

There are 1 best solutions below

4
On BEST ANSWER

Your formulation of independence seems to ignore the crucial conditional relationships. Also your interpretation concentrates on continuous distributions, whereas some applications use a mixture of discrete and continuous distributions. Furthermore, your statement of Bayes' theorem ignores the usual practice of specifying the likelihood function only up to a constant.

Bayes' Theorem applied to inference. Perhaps an applied example will be useful. Suppose our data is the number of counts recorded by a Geiger counter during a certain period of time. Typically, such counts are modeled as a Poisson random variable $Y \sim Pois(\theta),$ where we wish to find an interval estimate for the unknown mean $\theta$ of the Poisson distribution. One might write

$$h(y|\theta) = \frac{e^{-\theta}\theta^y}{y!} \propto \theta^ye^{-\theta},$$

using $\propto$ to indicate that the factor $1/y!$ is suppressed.

Likelihood function Here we are viewing $h(y|\theta)$ as a likelihood function of $\theta$ for known data $y,$ and the suppressed factor is not relevant to our inference.

Prior distribution. A Bayesian approach to this estimation problem is to begin with a prior distribution $\theta \sim Gamma(\alpha, \kappa),$ where $\alpha$ is the shape parameter and $\kappa$ is the rate parameter. One might write

$$g(\theta) \propto \theta^{\alpha-1}e^{-\kappa\theta},$$

again using $\propto$ instead of $=$ to signal suppression of an unnecessary factor, the constant of integration. If expert opinion or prior information suggests that $\theta \approx 12$ and $\theta > 25$ is unlikely, then we might choose $\alpha = 4$ and $\kappa = 1/3$ because $Gamma(4, 1/3)$ has mean $\mu = 12$ and puts only about 3% of its probability beyond 25. (If we had little prior information, we might pick smaller values for $\alpha$ and $\kappa.$)

Posterior distribution. Suppose our observation turns out to be $y = 8$ counts. Multiplying the prior by the likelihood we have the posterior distribution given by

$$g(\theta|y) \propto g(\theta)h(y|\theta) = \theta^{\alpha - 1}e^{-\kappa\theta} \times \theta^ye^{-\theta} = \theta^{(4+8)-1}e^{1(1/3+1)\theta},$$ which we recognize as the 'kernel' of $Gamma(12,4/3);$ that is, the density function without its constant of integration.

Because the gamma and Poisson distributions are 'conjugate' (mathematically compatible in a convenient way), we did not need to evaluate the denominator $h(y) = \int h(y|\theta)g(\theta)\,d\theta$ of your formula for Bayes' Theorem.

Estimation: Posterior mean and probability interval. The posterior mean is $12/(4/3) = 9$ and a 95% posterior interval estimate of $\theta$ is $(4.65, 14.76)$ [Obtained in R, using qgamma(c(.025,.975), 12. 4/3)].

Note: A frequentist 95% CI using the formula $Y+2 \pm 1.96\sqrt{Y+1}$ is $(4.12, 15.88)$. Thus our mildly informative prior has not had huge influence over the numerical values of the endpoints of the interval estimate. To the extent that the information contained in the prior distribution is applicable, the Bayesian estimate is superior.