I read the LDA paper multiple times but I'm having trouble with the following. Let's say I define a LDA model as:
- For each doc $m$:
- Sample topic probabilities $\theta_m \sim Dirichlet(\alpha)$
- For each word $n$:
- Sample a topic $z_{mn} \sim Multinomial(\theta_m)$
- Sample a word $w_{mn} \sim Multinomial(\beta)$
where $\alpha, \beta$ are fixed hyperparameters.
I'd like to find some probability distribution that approximates the real probability distribution of the model by minimizing the KL-divergence $K(q_{\gamma_m, \phi_m}(\theta, s) || p(\theta, s|t))$ in terms of $\phi, \gamma$. I'm defining $q_{\gamma, \phi}(\theta, s) = q_{\gamma}(\theta)\Pi_nq_{\phi_n}(s_n), q_\gamma(\theta)$ and $q_{\phi_n}(s_n)$ is multinomial. I'm not even where to start with generating a mean-field variational inference algorithm for this model.
The detailed derivation can be found in page 1019 of the paper.
It should be noted that there are alternative models that does not use variational inference for topic modeling. For example some people use Gibbs sampling, which is slow for large texts but mathematically simpler.