Polya Urn Scheme marginal distribution of draws

57 Views Asked by At

I'm going through the really well written notes https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.212.2959&rep=rep1&type=pdf. I'm at the point where they discuss about the Polya Urn Scheme and how it helps to prove the existence of a Dirichlet Process. My question arises on the following part where they derive the marginal distribution of draws $p(\theta_{1}, \theta_{2},..., \theta_{n})$ on page 10.

$$p(\theta_{1}, \theta_{2},..., \theta_{n}) = \frac{a^{C}\prod_{k=1}^{C}H(\theta_{k}^{*})(n_{c}-1)!}{(a+n-1)(a+n-2)...a}$$

I tried to make sense of this distribution based on how the Polya Urn Scheme sampling evolves. My thinking is the following:

  1. Since, the sampling process is exchangeable, the order that we take the new colors doesn't really matter so I demonstrate things by assuming that the first color sampled is the $\theta_{1}^{*}$ then the $\theta_{2}^{*}$ etc.

  2. For the first color we have the probability of choosing a new color $\frac{a}{a}$ since it is the first sampled color, with distribution $H(\theta_{1}^{*})$

  3. For the second color, we have the probability of choosing a new color (not the same as the previous one of course) $\frac{a}{a+1}$, and distribution $H(\theta_{2}^{*})$

  4. We repeat that about to the $Cth$ new color which we choose it with probability $\frac{a}{a+n-1}$ and distribution $H(\theta_{C}^{*})$.

Now if we put everything together I take partially the expression

$$\frac{a^{C}\prod_{k=1}^{C}H(\theta_{k}^{*})}{(a+n-1)(a+n-2)...a}$$

however the terms $(n_{c}-1)!$ are missing. How do those terms appear, it has something to do with the fact that we can have multiple $\theta_{i}, i=1,2,...,n$ equal to some $\theta_{c}^{*}$.

One guess is that $(n_{c}-1)!$ measures all the possible ways the the sampled colors $\theta_{i} = \theta^{*}_{c}$ might have arise. For example, suppose that we have $n_{c}=3$. Then we know that there exist $\theta_{c_{1}},\theta_{c_{2}},\theta_{c_{3}}$ equal to $\theta_{c}^{*}$. For convinience, lets choose that the first observation that gives the color $c$ is the $\theta_{c_{1}}$ then the $(n_{c}-1)!=(3-1)!=2$ gives the total number of possible ways that the $\theta_{c_{2}},\theta_{c_{3}}$ might arise, i.e. first the $\theta_{c_{2}}$ and then the $\theta_{c_{3}}$ or first the $\theta_{c_{3}}$ and then the $\theta_{c_{2}}$. Does this makes any sense? Ideally could someone explain to me why the $(n_{c}-1)!$ appears?