The background of this question is a generative process called reverse diffusion process, where one starts with a data distribution $x_0\sim p_{\rm data}(x_0)$ (each sample lies in $\mathbb{R}^D$) and defines a Markov chain (called diffusion process) $x_0,x_1,\cdots,x_T$ with $T$ sufficiently large, where the transitions are $$p(x_t|x_{t-1})=\mathcal N(\sqrt{1-\beta_t}x_{t-1},\beta_tI),\quad\beta_t\in(0,1).$$ The generative process learns to reverse the diffusion process in order to model $p_{\rm data}(x_0)$. An assumption is made that $p(x_T)=\mathcal N(0,I)$, so that the reverse process can start from $\mathcal N(0,I)$, from which numerical sampling is rather easy.
My question is whether this assumption is mathematically valid: does $p(x_T)$ tend to $\mathcal N(0,I)$ when $T\to\infty$?
Intuitively this makes sense because:
- Each transition adds some Gaussian noise to the previous one; it makes sense for the limiting distribution (if there is one) to be completely Gaussian.
- $\mathcal N(0,I)$ is invariant under transitions of the form $p(x'|x)=\mathcal N(\sqrt{1-\beta}x,\beta I)$: $$p(x')=\int p(x'|x)p(x){\rm d}x=\int\frac{1}{(2\pi\beta)^{D/2}}e^{-|x'-\sqrt{1-\beta}x|^2/(2\beta)}\frac{1}{(2\pi)^{D/2}}e^{-|x|^2/2}{\rm d}x=\frac{1}{(2\pi)^{D/2}}e^{-|x'|^2/2}$$ $$\implies x'\sim\mathcal N(0,I).$$
However I cannot prove that the limiting distribution is indeed $\mathcal N(0,I)$. Any help is appreciated.
Let $y_t = (x_t-\sqrt{1-\beta_t}x_{t-1})/\sqrt{\beta_t}$ for $t \ge 1$. By construction, $y_t$ the conditional law of $y_t$ given $(x_0,\ldots,x_{t-1})$ is $\mathcal{N}(0,I)$. Hence $y_t$ is independent of $(x_0,\ldots,x_{t-1})$ (therefore independent of $(x_0,y_1,\ldots,y_{t-1})$ and the distribution of $y_t$ is $\mathcal{N}(0,I)$. By recursion, $x_0,y_1,\ldots,y_{t-1},y_t$ are independent.
For every $t \ge 1$, $x_t = \sqrt{1-\beta_t}x_{t-1} + \sqrt{\beta_t}y_t$. By recursion, $$x_t = \prod_{k=1}^t\sqrt{1-\beta_k} x_0 + \sum_{k=1}^t \Big(\prod_{\ell = k+1}^t \sqrt{1-\beta_\ell} \Big) \sqrt{\beta_k}y_k.$$ Hence the conditional law of $x_t$ given $x_0$ is gaussian with expectation $\prod_{k=1}^t\sqrt{1-\beta_k} x_0$ and covariance matrix \begin{eqnarray*} \sum_{k=1}^t \Big(\prod_{\ell = k+1}^t (1-\beta_\ell) \Big)\beta_k I &=& \sum_{k=1}^t \Big(\prod_{\ell = k+1}^t (1-\beta_\ell) -\prod_{\ell = k}^t (1-\beta_\ell) \Big) I \\ &=& \Big(1 - \prod_{\ell = 1}^t (1-\beta_\ell) \Big) I. \end{eqnarray*} Here, we used the equality $\beta_k=1-(1-\beta_k)$ to get a telescoping sum.
If the series $\sum_k \beta_k$ diverges, then $\prod_{\ell = 1}^t (1-\beta_\ell) \to 0$ as $t \to +\infty$, so $\mathcal{L}(x_t|x_0) \to \mathcal{N}(0,I)$ as $t \to +\infty$.