As I understand it, samples generated by
$$f_{X_t|X_{t-1}}(x)=f_{X_t|Y_{t}}(x)f_{Y_t|X_{t-1}}(x)$$ converge to the marginal distribution of $X$, namely $f(x)$. Convrsely is true for $Y$. Thus, by sampling a chain $$X_0,Y_0,X_1,Y_1,\dots,X_n,Y_n$$ we generate samples from marginals, as $n$ tends to infinity.
I am reading the book Doing Bayesian Data Analysis, and we are doing Gibbs sampling, yet arrive at
As we can see, not only the samples from marginal posteriors are generated, namely $p(\mu|D)$ and $p(\theta_i|D)$, but also samples from the joint posteriror, namely $$p(\theta_i,\mu |D)$$
I do not understand how samples from the joint posterior are generated by Gibbs sampling. Please help me understand.
Does the tuple $$(X_n,Y_n)$$ converge to the sample from the joint? If yes, why?
