Intuitive explanation of the Latent Dirichlet Allocation (LDA)

371 Views Asked by At

I am searching for an intuitive explanation of the Latent Dirichlet Algorithm (LDA). What are the seperate steps when identfying the Topics in a Document Corpus?

Additionally I found this slide online with an explanation, but I am not sure how trustworthy this is or if it is correct.

enter image description here

Can somebody help?

2

There are 2 best solutions below

2
On

It's basically learning, with updates via Bayes's theorem, of how likely a topic is to be associated with a word and vice versa, and hence of which topics a document talks about, given the words that it contains. If you look here, the $\theta$s are to do with topic probabilities per document, the $\varphi$s with word probabilities per topic, and the $z$s to do with topic probabilities per word. The "Dirichlet" refers to a distribution of probabilities summing to 1, such as the contents of $\theta$, of $\varphi$.

1
On

An intuitive explanation of parameters:

  1. $\alpha$ determines the sparsity of topics, e.g. if we have a small $\alpha$, then each document will only contain very few topics. The whole corpus shares an $\alpha$, but for each document, it has a different $\theta$ (which we draw from the Dirichlet distribution each time).

  2. The whole corpus also shares the same set of topics $\beta$, note at first $\beta$ is also drawn from a Dirichlet distribution.

  3. Please don't mix $\phi$ and $\beta$, $\phi$ is the latent variable which was introduced in the variational inference method to approximate the posterior. That being said, if we use other methods, we do not need $\phi$ at all. (Sometimes people also use $\beta$ to represent the Dirichlet prior and $\phi$ as the topic.) However, this prior is not mentioned in the original LDA paper

The slide you attached here is about how to learn an LDA model, and it's CORRECT. It's actually talking about how to use the collapsed Gibbs sampling, which is a popular approach proposed in a PNAS paper, and the two probabilities mentioned in the slide refers to equation (5) in that paper. Note that the Gibbs sampling is slower, yet easier to implement than the variational inference method used in the original LDA paper.