Deriving variational inference for a simple example

47 Views Asked by At

My a goal is understand how variational inference works and be able to derive it in a simple way. To do so, I need help deriving variational inference for this simple example.

Given a set of labels $L$ and a directed graph $G = <V, E>$ with a non-empty set of vertices $V$ and a set of edges $E \subset V^2$, let us introduce a probability space $(\Omega, \mathcal{F}, P)$ and a random vector $\xi \colon \Omega \rightarrow L^V$. This means that the realisation of $\xi$ is a mapping $\xi(\omega)\colon V \rightarrow L$, called labelling. Let us denote the labelling $\xi(\omega)$ having a value $\ell \in L$ for the argument $v \in V$ a $\xi_v = \ell$. This random variable should have the Markov property $$ P(\xi_v = l | \xi_{v'}, \forall v' \in V \setminus \{v\}) = P(\xi_v = l | \xi_{v'}, \forall v': (v, v') \in E), \quad \forall v \in V, \forall \ell \in L, $$ and these probabilities are known. In other words, the probability of a label $\ell$ in a vertex $v$ is fully defined by the labels of the neighbours of the vertex $v$ in graph $G$.

Suppose we have a simple model with $L = {a, b}$, $V = {0, 1}$ and $E = {(0, 1)}$ with probabilities $$ P(\xi_0 = a | \xi_1 = a) = 0.1, \\ P(\xi_0 = b | \xi_1 = a) = 0.9, \\ P(\xi_0 = a | \xi_1 = b) = 0.2, \\ P(\xi_0 = b | \xi_1 = b) = 0.8, \\ P(\xi_1 = a | \xi_0 = a) = 0.5, \\ P(\xi_1 = b | \xi_0 = a) = 0.5, \\ P(\xi_1 = a | \xi_0 = b) = 0.5, \\ P(\xi_1 = b | \xi_0 = b) = 0.5. \\ $$

Next, we use an arbitrary order. Let it be $v = 0$ and then $v = 1$. Let us sample the value of $x^1(0)$. Given a fixed label $x^0(1)$, the probability of the vertex $0$ having a label $a$ is $0.1$, and the probability of the vertex $0$ having a label $b$ is $0.9$. Let us use some generator of uniform distribution and let the value $a$ if it gives us a number less than $0.1$, and $b$ otherwise. I've launched my generator and got $0.5$. It's not less than $0.1$, so I use the value $b$. Continuing like that will converge to the desired result.

My goal is to do the same thing, having those conditional probabilities, but instead of using a Gibbs sampler, I need to use mean field variational inference. My goal here is to apply what was stated by Kevin Murphy in his book Machine learning a probabilistic perspective" by Kevin Murphy

"Since we are replacing the neighboring values by their mean value, the method is known as mean field. This is very similar to Gibbs sampling (Section 24.2), except instead of sending sampled values between neighboring nodes, we send mean values between nodes."

So I want to see this "send mean values" concept in action

Could someone help me please?