Why is a conditional distribution harder to evaluate than a joint distribution in variational Bayes?

68 Views Asked by At

I'm hoping that someone can supply me with some intuition that seems to be escaping me. I'm trying to get a handle on the mathematics of the free energy principle (FEP), which is a variational Bayesian model applied to how an organism models its environment. My main guide is to be found in this article, which walks through the mathematics. Where my problem lies is in a distinction made between conditional and joint probability distributions.

The FEP claims that it's usually intractable for an organism to calculate the divergence between a 'true' probability distribution, P, across states of the environment given sensory data, s, and its model distribution, Q:

$$D_{KL}(Q||P) = \sum_{v_i\in V} \log q(v_i)\frac{q(v_i)}{p(v_i|s)}$$

I understand this well enough and it needs no explaining. Where I get puzzled is when this expression is re-arranged to generate the equivalent expressions below:

$$D_{KL}(Q||P) = \sum_{v_i\in V} \log q(v_i)\frac{q(v_i)}{p(v_i,s)} + \log p(s)$$

It's claimed that the joint distribution $p(v_i,s)$ is easier to evaluate than the conditional distribution $p(v_1|s)$. However, I don't understand why this should be any easier. After all, the organism still needs access to a distribution that covers all possible states of the environment in the joint distribution. So, either I'm not getting some mathematical fact here, or there are non-mathematical reasons why the jpint distribution is easier to estimate.

If the reason is mathematical, can anyone give me some guidance on what I'm missing? Thanks!