DP Chinese Restaurant Process Metaphor - Base Distribution?

87 Views Asked by At

On Wikipedia a Dirichlet Process $\text{DP}$ is described by the procedure;

  1. Draw a distribution $P$ from $\text{DP}(\alpha, H)$
  2. Draw observations $X_1, X_2, \dots$ independently from $P$

where $\alpha$ is the concentration parameter and $H$ is some 'base distribution'. I understand that a Dirichlet Process can also be described by the Chinese Restaurant Process (CRP) metaphor, which I won't describe here.

My question is, when using the CRP metaphor to explain the Dirichlet Process, what is the base distribution $H$?


Another answer elsewhere on Math Stack Exchange claims this can be any distribution, and gives the Normal distribution as an example;

Let's take a Gaussian distribution $\mathcal{N} \sim (\mu, \sigma^2)$ as an example. Note that a DP also depends on the concentration parameter α, let's consider two extreme cases: 1) when α is very small. 2) when α is very large. In case 1), we will find that the number of customers on most tables is around μ; In case 2), we will find that the distribution of the number of customers on those tables approximately follows the base distribution.

But this can't be true - I could select a base distribution with a negative mean and then the number of customers on tables would tend to be negative. Indeed, any distribution with unbounded support on the reals invalidates this claim.

1

There are 1 best solutions below

0
On

From the paper that formally defines Dirichlet Process,

Proposition 1. Let $P$ be a Dirichlet process on ( $\mathscr{X}, \mathscr{A}$ ) with parameter $\alpha$, and let $A \in \mathscr{A}$. If $\alpha(A)=0,$ then $P(A)=0$ with probability one. If $\alpha(A)>0,$ then $P(A)>0$ with probability one. Furthermore, $\mathscr{E} P(A)=\alpha(A) / \alpha(\mathscr{X})$

Clearly the base distribution $\alpha$ cannot have support on the negative real line. So my guess for the CRP is that any base method that has support only on the positive real line would do.