When is the posterior distribution "continuous" in the prior?

190 Views Asked by At

If we consider Bayes rule in the continuous case, is there any conditions that guarantee a small change in the prior distribution will not change the posterior distribution too much?

If so, when can we characterize the convergence speed?

1

There are 1 best solutions below

0
On BEST ANSWER

This is related to the field of sensitivity analysis. For example, suppose we had a normal likelihood:

\begin{equation} X_i \overset{iid}{\sim}\textrm{N}(\mu,\sigma^2) \end{equation}

With $\sigma$ known, and with a normal prior on $\mu$:

\begin{equation} \mu \sim \textrm{N}(\eta,\tau^2) \end{equation}

Then the posterior distribution is given by:

\begin{equation} \mu|\{X_i\}_{i=1}^n \sim \textrm{N}(\frac{1}{(\frac{1}{\tau^2}+\frac{n}{\sigma^2})}(\frac{\eta}{\tau^2}+\frac{\sum_{i=1}^nX_i}{\sigma^2}),(\frac{1}{\tau^2}+\frac{n}{\sigma^2})^{-1}) \end{equation}

Then, knowing that $\tau$ represents our prior uncertainty around $\mu$, we see that $(\frac{1}{\tau^2} + \frac{n}{\sigma^2})^{-1}$ represents our posterior certainty $\mu | \{X_i\}_{i=1}^n$.

Taking the derivative of $(\frac{1}{\tau^2} + \frac{n}{\sigma^2})^{-1}$ gives us:

\begin{equation} \frac{\partial}{\partial\tau^2}(\frac{1}{\tau^2} + \frac{n}{\sigma^2})^{-1} = (\frac{1}{\tau^2} + \frac{n}{\sigma^2})^{-2}\frac{1}{\tau^4} \end{equation}

Which means that increasing the prior variance increases the posterior variance, but at a rate which is decreasing as $\tau \rightarrow \infty$. Additionally, because $\frac{1}{\tau^2}\rightarrow 0$ as $\tau \rightarrow \infty$, we have that as $\tau \rightarrow \infty$ the change in variance becomes more of a function of $n$ than of $\tau$.

One other point is that because:

\begin{equation} \frac{\partial}{\partial\sigma^2}(\frac{1}{\tau^2} + \frac{n}{\sigma^2})^{-1} = (\frac{1}{\tau^2} + \frac{n}{\sigma^2})^{-2}\frac{n}{\sigma^4} \end{equation}

Our uncertainty around $\mu$ increases as our uncertainty around the original observations increases, but at a rate which is diminished with increasing sample sizes.

You could also do this with discrete observations, so long as the posterior and prior are continuous.

An obvious example would be a Binomial-Beta model:

We have $\{X_i\}\overset{iid}{\sim}\textrm{Binomial}(m,p)$ with $p\sim \textrm{Beta}(\omega,\kappa)$ where $\textrm{Beta}(\omega,\kappa)$ is the mode and concentration parameterization of the Beta distribution.

Then we get:

\begin{equation} p|\{X_i\} \sim \textrm{Beta}(\omega(\kappa-2)+1+\sum_{i=1}^nX_i,(1-\omega)(\kappa-2)+1+b-\sum_{i=1}^nX_i) \end{equation}

Where the Beta in the expression is using the standard parameterization. Then the new concentration $\kappa_{p|\{X_i\}} = n +\kappa$, which means that:

\begin{equation} \frac{\partial}{\partial \kappa}\kappa_{p|\{X_i\}} = n+1 \end{equation}

Which means that increasing the prior concentration increases the posterior concentration, at a rate $n+1$.

The posterior mode is given by:

\begin{equation} \omega_{p|\{X_i\}} = \frac{\omega(\kappa-2)+\sum_{i=1}^nX_i}{n+\kappa-1} \end{equation}

Then:

\begin{equation} \frac{\partial}{\partial\omega}\omega_{p|\{X_i\}} = \frac{\kappa-2}{n+\kappa-1} \end{equation}

So we can see that for a highly concentrated prior $(\kappa\rightarrow \infty)$ we get $\frac{\partial}{\partial\omega} \approx 1$, giving us that a change in prior mode roughly corresponds to a change in the posterior mode at a 1-1 rate. Similarly, we see that as $n\rightarrow\infty$ the posterior mode derivative is $0$, giving us that for large samples the posterior and prior mode are roughly independent.