Posterior distribution of a random distribution sampled from Dirichlet process

112 Views Asked by Bumbble Comm At 25 Mar 2026 - 3:58

I was reading a bit into nonparametric Bayesian statistics and came across the expression of the posterior of a random distribution $G$ sampled from the Dirichlet process given data sampled from $G$, namely:

Let $G\sim DP(\alpha, H)$ be a random probability measure on a standard Borel space $\mathcal{X}$. Let $X_1,\dots,X_n\stackrel{\text{iid}}{\sim} G$. Then $$G\mid X_1,\dots,X_n\sim DP\left(\alpha+n,\; \frac{\alpha H + \sum_{i=1}^n \delta_{X_i}}{\alpha+n}\right).$$ However, no proof was given in the lecture notes and in other notes I could only find short, hand-waving proofs. I wanted to do this rigourously but don't know if the proof is correct. My proof follows from the fact that $G\sim DP(\alpha, H)$ if and only if for every partition $A_1,\dots, A_k$ of $\mathcal{X}$ we have that the vector $(G(A_1),\dots, G(A_k))\sim \operatorname{Dir}(\alpha H(A_1),\dots, \alpha H(A_k)).$

My proof: Let $A_1,\dots,A_k$ be a partition of $\mathcal{X}$ and let $N_j$ be $\sum_{i=1}^n\mathbb{1}_{\{X_i\in A_j\}}$ for $j\in\{1,\dots,k\}$. For our notation, denote $V_A:= (G(A_1),\dots,G(A_k)),\; X := (X_1,\dots,X_n),\; N_A:=(N_1,\dots,N_k)$ and let $n_A:=(n_1,\dots,n_k)$ be our observation of $N_A.$

Then we have by Bayes' rule and the definition of $G$ that: $$ f_{V_A\mid X}\left(p_1,\dots,p_k \right) \stackrel{(*)}{=}f_{V_A\mid N_A}\left(p_1,\dots,p_k \right) \propto f_{N_A\mid V_A}\left(n_1,\dots,n_k\right) f_{V_A}(p_1,\dots,p_k)\\\propto \prod_{i=1}^k p_j^{n_j} e^{\sum_{j=1}^k(\alpha H(A_j)-1)\log(p_j)} = e^{\sum_{i=1}^k n_j \log(p_j)} e^{\sum_{j=1}^k(\alpha H(A_j)-1)\log(p_j)}= e^{\sum_{j=1}^k(n_j+\alpha H(A_j)-1)\log(p_j)}. $$ Hence $V_A\mid X\sim \operatorname{Dir}(\alpha H(A_1)+\sum_{j=1}^n \delta_{X_j}(A_1),\dots,\alpha H(A_k)+\sum_{j=1}^n \delta_{X_j}(A_k))$ and by normalizing the measure and using the if and only if statement above: $$G\mid X\sim DP\left(a+n,\frac{\alpha H + \sum_{i=1}^n \delta_{X_i}}{\alpha+n}\right).\tag{$\square$}$$

The part of the proof I'm not quite sure about is the equation with $(*)$ above it. I got some information from page 41 of this source: https://www4.stat.ncsu.edu/~sghosal/papers/BayesAsymp.pdf

The crux here is that we can consider a finer partition $B_1,\dots, B_m$ of $\mathcal{X}$ than $A$. Following the notation above, we see that $$V_B\mid N_B\sim \operatorname{Dir}(\alpha H(B_1)+\sum_{j=1}^n \delta_{X_j}(B_1),\dots,\alpha H(B_m)+\sum_{j=1}^n \delta_{X_j}(B_m))$$ If we now denote $s_{i,j}:=\mathbb{1}_{\{B_i\subseteq A_j\}}$ then we must have that $\sum_{i=1}^m s_{i,j}G(B_{i})=G(A_j)$ and hence by the aggregation property of the Dirichlet distribution and as the partition is disjoint: \begin{align} V_A\mid N_B & \sim \operatorname{Dir}\left(\sum_{j=1}^ms_{j,1}\left(\alpha H(B_j) + \sum_{k=1}^n \delta_{X_k}(B_j)\right), \dots, \sum_{j=1}^ms_{j,m} \left(\alpha H(B_j) + \sum_{k=1}^n \delta_{X_k}(B_j)\right)\right) \\ & \sim \operatorname{Dir}(\alpha H(A_1) + \sum_{j=1}^n \delta_{X_j}(A_1),\dots,\alpha H(A_k)+\sum_{j=1}^n \delta_{X_j}(A_k))\sim V_A\mid N_A. \end{align}

Now consider a nested sequence of partitions $(A_j)_{j\in\mathbb{N}}$ converging to a partition over the dense subset of the regular Borel space $\mathcal{X}$. Then we know by Lévy's upward theorem (as $V_A$ is bounded, hence in $L^1$) that $V_j:= V_A\mid N_{A_j}$ is a martingale converging almost surely to a version of $V_A\mid \mathcal{F}$ where $\mathcal{F} = \sigma(\cup_{j} \sigma(N_{A_j})).$

Now we have $\mathcal{F}=\sigma(X)$. This holds as $\sigma(X)=\sigma(X^{-1}(\mathcal{B}(\prod_{i=1}^n\mathcal{X})))$ (where $\mathcal{B}(\prod_{i=1}^n\mathcal{X})$ are Borel sets of $\prod_{i=1}^n \mathcal{X}$ and $\prod$ denotes Cartesian product) and $\sigma(N_{A_j})= \sigma(X^{-1}(\prod_{i=1}^n A_j))$ (where $A_j$ denotes the collection). As every Borel set in $\mathcal{B}(\prod_{i=1}^n \mathcal{X})$ can be approximated arbitrarily close by (possible unions) of sets in $\prod_{i=1}^n A_j$ by taking $j$ large enough, the sigma algebra's are equal in the limit.

Hence as $V_A\mid N_A\sim V_A\mid N_{A_j}$ for all $j$, and $V_j\stackrel{d}{\rightarrow}V_A\mid X$ (by almost sure convergence), we must have $V_A\mid N_A\sim V_A\mid X$ and hence the corresponding densities are the same.

As I said I'm not quite sure about the last part, certainly the part with the sigma-algebra's. I hope you could give some feedback..

The other sources I checked were: http://stat.columbia.edu/~porbanz/papers/porbanz_BNP_draft.pdf http://www.math.leidenuniv.nl/~avdvaart/BNP/BNP.pdf https://www.stats.ox.ac.uk/~teh/research/npbayes/Teh2010a.pdf

Original Q&A

Posterior distribution of a random distribution sampled from Dirichlet process

Related Questions in PROBABILITY-THEORY

Related Questions in STATISTICS

Related Questions in MEASURE-THEORY

Related Questions in STATISTICAL-INFERENCE

Related Questions in BAYESIAN

Trending Questions

Popular # Hahtags

Popular Questions