I'm trying to work through the free energy principle (FEP) for biological organisms. This involves the claim that organisms minimise surprisal––encountering low-frequency sensory states––by changing their models of the environment or acting on the environment to change sensory data. As free energy is an upper bound on surprisal, minimising free energy forces down surprisal, too. The Wikipedia entry is here.
I'm OK with most of this, until I get to the following formalism, which suggests that free energy can be understood as complexity minus accuracy. Here, s is a sensory state, µ is a 'guess' at a hidden environmental state, ψ, causing s, and m is a generative model that supplies an organism with a 'map' of the states it's likely to be in and their causes (typically, its body plan).
$$F (s,µ) = D_{KL}[q(ψ|µ)||p(ψ|m)] - E_q[\log(p(s|ψ,m)]$$
The Kullback-Leibler term, $D_{KL}[q(ψ|µ)||p(ψ|m)]$, is taken to represent complexity, and the expected value, $E_q[\log(p(s|ψ,m)$, to give accuracy. However, while I can just make intuitive sense of $D_{KL}$ as a complexity measure (complex models make greater use of variables, hence bigger $D_{KL}$), the accuracy expression as me stumped. My questions are:
- What does it mean to define an expected value for distribution p under q (i.e. $E_q$)?
- The expression $\log(p(s|ψ,m))$ seems to be an expression of surprisal, which might make sense if entropy (the long term value of surprisal) is being calculated. But wouldn't entropy be better measure of complexity than accuracy? And if so, why isn't there a negative sign?
While I understand the principle at work (a complex model can only pay its way with high accuracy), the mathematical detail bothers me as it may mean my intuitions are off target. And I may be entirely wrong in my reading. Any help appreciated!
Q1. this is not an expected value for p - the expectation is over the $\psi$! That's the only thing $q$ is a distribution over, right? The $\log p(\cdot|\psi, \cdot)$ is 'just' a function of $\psi$.
Q2. The question we need to begin with is ''When would a model be considered accurate?'' Typically this is taken to mean that the model's predictions align with 'reality'. So, what is this real thing we should look at? Well, the only observation we have of 'reality' is the input $s$. So if a model is accurate, it must say that $s$ is pretty likely - if instead it said it was quite unlikely, then the model isn't explaining our observation. (I'm not saying anything special here - indeed, the whole idea behind var Bayes is already to find a sufficiently simple model that makes the data observed have a high likelihood)
Now we can look at the quantification offered in the expression above. The generative model $p(s|\psi, m)$ is treated as fixed. Then the expression $$\mathbb{E}_q[ \log p(s|\psi, m)] = \int q(\psi |\mu) \log p(s|\psi,m) \,\mathrm{d}\psi$$ is big when $q$ puts a lot of mass on the $\psi$ for which $p(s|\psi, m)$ is big. And it is small (or even negative) if $q$ puts a lot of mass on $\psi$ for which $p(s|\psi, m)$ is very small. So this fits well with the intuition offered above.