Understanding Dirichlet Process (vs Dirichlet Distribution)

802 Views Asked by At

I'm studying the Dirichlet process but I'm confused.

I think I can visualize it, through the Chinese Restaurant Example or The Pólya urn scheme. But I cannot see the connection with Dirichlet Distribution.

Particularly on Wikipedia regarding it, it's stated that

The Dirichlet process can also be seen as the infinite-dimensional generalization of the Dirichlet distribution.

How can it be? Can you provide me more insight regarding this sentence?

2

There are 2 best solutions below

4
On BEST ANSWER

One definition of the Dirichlet process is that given a probability space, $(X, A, G)$ and an arbitrary partition of $X$ given by $A_1, \dots A_k$ a distribution $G$ is a Dirichlet process with probability measure $G_0$ and mass parameter $M$ if we have $$G(A_1, \dots, G(A_k)) \sim Dir(MG_0(A_1), \dots MG_0(A_k)$$

This can be found from an article here. This explains the definition in terms of the Dirichlet distribution and where the Dirichlet name comes from.

Now then what you do is that you verify some conditions, and then apply the Kolmogorov Extension Theorem, which gives you the desired infinite-dimensional distribution from the earlier collection of finite-dimensional distributions.

EDIT: (In simpler language)

Take all subsets of the original probability space, the Dirichlet process is a distribution where any group of subsets follow the Dirichlet distribution. Now what you have is a collection of finite-dimensional distributions. To get an infinite dimensional distributions from this, you have to use the Kolmogorov extension theorem. So you check a few properties to check that the theorem holds, and what it states is that your definition from finite dimensional distributions determines an infinite-dimensional distribution uniquely.

2
On

Add to what has been contributed to the question:

  1. Is Dirichlet process a Dirichlet distribution?

No. A random sample from a Dirichlet distribution of order $3$ will have the format like $(0.3, 0.2, 0.5)$ with three non-negative elements add up to $1$ and similarly a random sample from a Dirichlet distribution of order $4$ will have the format like $(0.15, 0.05, 0.6, 0.2)$. A random sample from a Dirichlet process, say $G$ can be anything and it depends on the set $X$, $G$ is defined. A random sample can be a float like $0.4$, a tuple $(32, 3, 1, 387.232)$, "rainy" or "sunny", or "basketball" and "football". The connection is that given any (measurable) partition of $X$ noted as $B_1,\ldots,B_K$, $G$ as a random probability, $G(B_1),\ldots,G(B_K)$ follows a Dirichlet distribution. I'm not sure it's a good idea to say a Dirichlet process is a Dirichlet distribution of infinite dimension, but $K$ can be any positive number, like $1$, $12345$, $10^{23}$, or $9999999999$.

  1. How does the "process" term come into play?

The random samples already observed from a Dirichlet process will more likely to appear again. Strictly speaking, $G_0$ is the base probability measure of Dirichlet process $G$, $G(B_1),\ldots,G(B_K) \sim$ Dirichlet distribution of order $K (MG_0(B_1),\ldots, MG_0(B_k))$ with $M$ as a scale parameter (in most literature denoted as $\alpha$ or $\mu$) and random samples/observations $x_1,x_2,\ldots,x_m$ are generated from $G$, then the conditional distribution of $G$ given $x_1,\ldots,x_m$ is a Dirichlet process with base probability $MG_0$ plus the summation of delta functions from $x_1$ to $x_m$, $G(B_1),\ldots,G(B_K) \sim$ Dirichlet distribution of order $K (MG_0(B_1) +$ # of $x_i$'s in $B_1$, $MG_0(B_2) +$ # of $x_i$'s in $B_2,\ldots, MG_0(B_K) +$ # of $x_i$'s in $B_K$). You can still get random samples other than $x_1,\ldots,x_m$, but the new random sample is likely to be a repetition of existing random samples.

  1. How does a Dirichlet process connect to a Chinese restaurant process?

$M$ random samples $x_1,\ldots,x_m$ are generated from Dirichlet process $G$ and consider the samples having the same value are customers sitting at the same table in a Chinese restaurant. The next random sample generated from $G$ is more likely to be the more frequent observation in $x_1,\ldots,x_m,$ i.e. the next customer entering the restaurant is more likely to join the most crowded table.