Understanding statistical hierarchical models

172 Views Asked by At

I was given the follwoing question with solution:

enter image description here

I however do not understand how they produced their answer.

First, what general formula was used to produced the value $p(y)=\int...d\lambda$?

Second, why is $\lambda$ the value of $y$ for the Gamma distribution section of the proposed formula?

In other words why is the Gamma section of the formula not: $$\frac{y^{\alpha-1}e^{-y/\beta}}{\beta^\alpha\Gamma(\alpha)}$$

1

There are 1 best solutions below

3
On BEST ANSWER

Let's work with a simpler example to help you understand what is going on.

Suppose we have the following distribution:

$$\begin{align*} \Pr[X = 1 \mid \theta] &= \theta \\ \Pr[X = 0 \mid \theta] &= 1-\theta. \end{align*}$$

That is to say, $X \mid \theta \sim \operatorname{Bernoulli}(\theta)$, or $X$ conditioned on $\theta$ is a Bernoulli random variable with parameter $\theta$. Now, suppose $\theta$ is itself a random variable, such that

$$\begin{align*}\Pr[\theta = 1/4] &= 1/6 \\ \Pr[\theta = 1/2] &= 1/3 \\ \Pr[\theta = 3/4] &= 1/2. \end{align*}$$

Then, what is the unconditional or marginal distribution of $X$? If we think carefully for a moment, we can reason that $X$ is also Bernoulli--after all, it still can only take on the values $0$ or $1$. But we should not expect that the parameter is $\theta$; after all, it should be a specific fixed number. It certainly cannot be a function of $\theta$. In fact, we can compute as follows, using the law of total probability: $$\begin{align*} \Pr[X = 1] &= \Pr[X = 1 \mid \theta = 1/4]\Pr[\theta = 1/4] + \Pr[X = 1 \mid \theta = 1/2]\Pr[\theta = 1/2] \\ & \quad + \Pr[X = 1 \mid \theta = 3/4]\Pr[\theta = 3/4] \\ &= \frac{1}{4} \cdot \frac{1}{6} + \frac{1}{2} \cdot \frac{1}{3} + \frac{3}{4} \cdot \frac{1}{2} \\ &= \frac{7}{12}.\end{align*}$$ I leave it as an exercise for you to show that $\Pr[X = 0] = 5/12$, thus completely characterizing the unconditional distribution of $X$ as Bernoulli.

What did we do here? On the one hand, we simply applied the law of total probability. But what we were really doing in a sense was computing a marginal probability based on a joint distribution of $X$ and $\theta$; namely, $$\Pr[X = x] = \sum_{t \in S} \Pr[(X = x) \cap (\theta = t)],$$ where $S$ is the support of the random variable $\theta$, and in the process, rewrote the joint mass function as a conditional probability; e.g., $\Pr[A \cap B] = \Pr[A \mid B]\Pr[B]$. In the continuous case, the sum becomes an integral and the mass functions become densities: $$f_X(x) = \int_{t \in S} f_{X \mid \theta}(x \mid \theta) f_{\theta}(t) \, dt.$$

Note how the sum becomes an integral if $\theta$ is continuous, but the result could remain a mass function if $X \mid \theta$ is discrete: for example, if $X \mid \theta \sim \operatorname{Bernoulli}(\theta)$ as above, but now $\theta \sim \operatorname{Beta}(a,b)$, now you have to compute $$\Pr[X = 1] = \int_{t = 0}^1 \Pr[X = 1 \mid \theta = t] f_\theta(t) \, dt,$$ where $$f_\theta(t) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} t^{a-1} (1-t)^{b-1}, \quad 0 < t < 1$$ is the beta density. I leave it to you to compute this result.

So, this is what is happening in your hierarchical model: it is the application of the formula $$\Pr[Y = y] = \int_{\lambda = 0}^\infty Pr[Y = y \mid \Lambda = \lambda] f_\Lambda(\lambda) \, d\lambda,$$ where $$Y \mid \Lambda \sim \operatorname{Poisson}(\Lambda), \quad \Lambda \sim \operatorname{Gamma}(\alpha,\beta).$$ I have modified the notation slightly so as to make it clearer when we are using $\lambda$ as a variable of integration, versus $\Lambda$ as a random variable.