Sampling from Bayesian posterior without a tractable prior

172 Views Asked by At

Let's say that we have a Bayesian network that looks like this:

A -> B -> C

Obviously this network is fully specified by the set of conditional and unconditional probability density/mass functions:

P(A), P(B|A), P(C|B)

Now, let's say that we have observed that C takes on some value c. In light of this evidence, we want to obtain samples from the posterior distribution of B conditioned on C = c.

In order to do this in a Metropolis-style way, we're required to evaluate the prior probability of B when we calculate the acceptance probability, which requires us to evaluate the integral: $$\int P(B|A)P(A)dA$$

But what if this integral is intractable? Then we can't evaluate the prior probability of B. We could probably estimate it, using more Monte Carlo methods (since we could generate samples from P(A,B) using P(A) and P(B|A)), but that seems inefficient.

How can we sample from P(B|C)?

1

There are 1 best solutions below

1
On BEST ANSWER

You don't want to be sampling from the prior $P(A, B)$ but the posterior $P(A, B | C)$, by for example using $$ P(A, B | C ) \propto P(C | B) P(B | A) P(A), $$ in a Metropolis-Hastings or some other approach. Now assuming you can do this and you set up your Markov chain you will have samples $(A_i, B_i) \sim P(A, B | C)$ and you can then estimate any functional you like of the variable $B$ by just ignoring the samples of $A$, now sure that does seem somewhat inefficient, but that inefficiency is the price you have paid for avoiding the intractable integral and it is often a price worth paying!

To clarify if you are interested in some statistic $f(B)$ then you can write it as a function of both variables, $f(a, b) = f(b)$ say, then you set up the Markov chain and then use your sample to estimate $$ \begin{align} \frac{1}{N}\sum_{i=1}^{N} f(A_i,B_i) \rightarrow &\int_\mathcal{B} \int_{\mathcal{A}} f(a, b) p(a,b|c)dadb \\ &=\int_\mathcal{B} f(b) \left( \int_{\mathcal{A}} p(a,b|c) da \right) db \\ &= \int_{\mathcal{B}} f(b) \, p(b|c)db = \mathbb{E}_{B \sim P(B|C)}\left[ f(B) \right]. \end{align} $$ Finally worth mentioning that the attractiveness of sampling methods is how easy they are to set up and produce some output - it can be more challenging to assess questions regarding convergence of that output, and there are alternatives to sampling methods with their own particular pros and cons.