bayesian hyperpriors central limit theorem analogy

38 Views Asked by At

Let's say I have some distribution F, and then put a prior on a parameter of that distribution by using some prior distribution P1. Then let's say I put a hyperprior on that hyperparamter by using a distribution P2. To be concrete, think of a beta-binomial model.

But then let's say we continue putting priors on the parameters and on those parameters, etc. in the limit that we go to infinitely many levels of priors. Are there any kinds of theorems, etc. or interesting results related to doing this many times? In particular is there any kind of central limit theorem analogy such that if I repeat this toward infinitely many times, having hyperhyperhyper---hyperpriors on things, that I would end up getting some kind of limiting distribution for F?

1

There are 1 best solutions below

1
On

Practically speaking, putting deeper and deeper layers in the hierarchy doesn't do much good, because the data tend to have little to say about the deeper levels.

Adding a few more hyperpriors could change the amount of data needed to get good estimates from 100s to 10000s very quickly.

Never-the-less, putting a hyper prior on a parameter is just placing a mixture distribution on the distribution below it, for example, in the setting:

\begin{equation} \begin{split} Y | \mu_1 & \sim N(\mu_1,\sigma^2_1)\\ \mu_1 & \sim N(\mu_2,\sigma^2_2) \end{split} \end{equation}

You can see $\int_{-\infty}^\infty P[Y|\mu_1]P[\mu_1]d\mu_1 = P[Y]$ is still just a distribution on $Y$; one with some of the responsibility of specifying $\mu_1$ removed, but nevertheless still just a sampling distribution on the data.

One thing to note is that in this setting the exact distribution for $Y$ after integrating out $\mu_1$ is $N(\mu_2,\sigma_1^2+\sigma_2^2)$. So, for example, if you specified a hierarchical model like:

\begin{equation} \begin{split} Y | \mu_1 & \sim N(\mu_1,\sigma^2_1)\\ \mu_1 | \mu_2 & \sim N(\mu_2,\sigma^2_2)\\ \mu_2 | \mu_3 & \sim N(\mu_3,\sigma^2_3)\\ & ...\\ \mu_{n-1} | \mu_n & \sim N(\mu_n,\sigma^2_n)\\ \end{split} \end{equation}

The scenario you just described is happening, and the marginal distribution of $Y$ is: \begin{equation} Y \sim N(\mu_n, \sum_{i=1}^n \sigma^2_i) \end{equation}

So in essence, placing multiple hyper-priors in deeper and deeper sequence just increasingly "mixes" your base distribution into a more diffuse one.