This question comes from reading Denoising Diffusion Probabilistic Models.
Given $T\in \mathbb N$ and $\alpha_t,\beta_t\in (0,\infty).$
Consider a Markov chain $X_0, X_1,\cdots, X_T$ in $\mathbb R^n$ with
the initial distribution $q(x_0)$ and
the transition density $$ \begin{aligned} q(x_{t}\vert x_{t-1}) = \mathcal{N}\bigl(x_t;\sqrt{\alpha_t}x_{t-1},\beta_t \mathbf{I}\bigr). \end{aligned} $$
We know that if we condition $X_0=x_0,$ $q(x_{t-1}\vert x_t,x_0)$ will be normal: $$ \begin{aligned} q(x_{t-1}\vert x_t,x_0) = \frac{\overbrace{q(x_{t-1}\vert x_0) q(x_t\vert x_{t-1})}^{\exp \text{ of a quadratic polynomial with some negative leading coefficient}}}{\underbrace{q(x_t\vert x_0)}_{\text{constant}}}. \end{aligned} $$
There is no problem with mathematical calculation, but is there any intuitive view, or the way in which this phenomenon was first observed. For example: Think of $X_{t-1}$ as the addition of some independent variables or something.
You can write $q(x_{t-1}|x_t,x_0)$ as
$$ q(x_{t-1}|x_t,x_0) = \frac{q(x_t|x_{t-1},x_0)q(x_{t-1}|x_0)}{q(x_t|x_0)}. $$
In this setting, $q(x_{t-1}|x_t,x_0)$ can be interpreted as the posterior distribution for $x_{t-1}$, where the likelihood function is given by $q(x_t|x_{t-1},x_0)=q(x_t|x_{t-1})$ and the prior is $q(x_{t-1}|x_0)$. In the context of that paper, both the prior and the likelihood have a normal distribution. When that happens, the posterior has a normal distribution too, which is a well-known result from Bayesian inference.