Explanation of Formal Definition of Dirichlet Process

159 Views Asked by At

I am reading about the Dirichlet process and I can understand the construction from Chinese restaurant process or stick-breaking process or Polya urn scheme. Now I am trying to understand why Dirichlet process is a distribution of distribution from its formal/original definition from link of Wikipedia:

Given a measureable set S, a base probability distribution H and a positive real number $\alpha$, the Dirichlet process $DP(H, \alpha)$ is a stochastic process whose sample path is a probability distribution over $S$, such that the following holds:

for any measureable finite partition of $S$, denoted $\{B_i\}_{i=1}^n$, if $X \sim DP(H, \alpha)$, then $$(X(B_1), ..., X(B_n)) \sim Dir(\alpha H(B_1), ..., \alpha H(B_n))$$ where $Dir$ denotes Dirichlet distribution.

I have the basic understanding of the concept of measure theory and I can understand the terms in this definition. However, I fail to get a picture from the above definition by linking with Chinese restaurant processes:

  1. Considering from the Chinese restaurant process, what should this measureable set $S$ and its parition correspond to? Is $S$ all customers and does the partition correspond to a particular way how all customers are partitioned into tables?
  2. What does $X(B_i)$ mean in Chinese restaurant process? Does it mean the probability that a specific group of customers sit at table $i$?
  3. What does the vector $(X(B_1), ..., X(B_n))$ mean? Is it the vector of probabilities for each table to have a specific group of customers? Or is it the multinomial distribution vector for a new customer to sit at any table?
  4. For Chinese restaurant process, what should be an intuitive example of the base distribution $H$? And what is the meaning of $H(B_1)$ here?
1

There are 1 best solutions below

0
On
  1. Yes to both.
  2. In the Chinese restaurant process, we don't care about a specific group of customers, as it's an exchangeable process, so each customer can be viewed as identical, and only the number of customers at each table matters. Here $X(B_i)$ means the probability mass of customers on table $i$, e.g. $\frac{N_i}{N_{total}}$. As you can see, the vector sums to 1.
  3. The same as 2.
  4. The base distribution can be any random distribution, and $H(B_1)$ is just the base distribution over $B_1$. Let's take a Gaussian distribution $N \sim (\mu,\sigma^2)$ as an example. Note that a DP also depends on the concentration parameter $\alpha$, let's consider two extreme cases: 1) when $\alpha$ is very small. 2) when $\alpha$ is very large. In case 1), we will find that the number of customers on most tables is around $\mu$; In case 2), we will find that the distribution of the number of customers on those tables approximately follows the base distribution.

P.S. The assignment of a new customer is not represented here (that part is in posterior predictive), instead, what we see here is the result of customer assignments.