I am not very familiar with math notations so please bear with me.
Let's imagine that I repeat the same chain $N$ times to estimate quantity $a$. Let's also imagine that I acquire $M$ samples per chain, after a copious burn-in period (to let the simulation reach steady-state).
For the sake of discussion, $N = M$ and the samples within each chains are supposed to be independent.
Then would taking the average value of $a$ over a single chain and the average value of $a$ at a single iteration point across the chains be similar? (i.e. would they be expected to follow the same distribution?).
I am asking because in some cases, these definitions do not seem equivalent: $a$ is very consistent if I average within chains but I know that the values within the chain are only weakly independent. At the same time, averaging across chains (at a given iteration number) provides inconsistent results but the individual values are guaranteed to be independent.
The property that long-time averages of a single trajectory converge to ensemble averages against the stationary distribution as time goes on is called ergodicity of the chain. Sufficient conditions for ergodicity are known, and are usually satisfied in applications.
Of course the two are not the same if you only run the chain for some finite amount of time (which we always do). That is, ergodicity is only about convergence. A context-dependent issue is the convergence rate, which can potentially be quite slow (e.g. in the presence of metastability effects). Finite time averages will also not follow exactly the same distribution as finite ensemble averages.