In information theory, an information source produces some sequence of symbols $X_{1}X_{2}\cdots X_{n}$ (i.e. a sequence of discrete random variables in some alphabet $\mathcal{X}$). These random variables are often treated as independent and identically distributed (i.e. all mutually independent and $X_{i}\sim X$ for all $i$).
Of course, in practice, such an input sequence usually has some arbitrary dependence amongst the variables. When accounting for such considerations, the theorems in the literature always model the input sequence as being stationary and ergodic. What's the deal with stationary and ergodic sequences in information theory, and what are some canonical examples of their relevance?
As I understand it, stationarity means the the joint distribution for $X_{i_{1}}\cdots X_{i_{k}}$ is the same as that for $X_{i_{1}+t}\cdots X_{i_{k}+t}$ for any integer $t$ and any subset of $\{X_{i_{k}}\}_{k}$ of the input sequence.
Ergodicity: not really sure...I'd be happy to get an explanation of what it means/what its relevance is within this context instead of a formal definition though.
Regarding stationarity, you have the right definition and, probably, you grasp the idea: the statistics does not depend on "time". It's also not difficult to guess why we often impose stationarity to have a tractable theory of general (not necessarily independent) processes.
But that's often not enough, for many models we require not only stationarity but also ergodicity.
Leaving aside formal definitions, informally: a stochastic process (random sequence) is ergodic if you can estimate expectations by averaging a over a single realization along time ("A random process is ergodic if its time average is the same as its average over the probability space" - Wikipedia).
Some examples of processes which are stationary but not ergodic:
$X_0$ takes value in $\{0,1\}$ with equal probability. Afterwards $X_{n}=X_{n-1}$ ($n>0$).
$X_0$ takes value in $\{0,1,2,3\}$ with eq. prob. Afterwards $X_{n}=X_{n-1}+2 \pmod 4$ .
$X_0$ takes value in $\{0,1,2,3\}$ with eq. prob. Afterwards $X_{n}=X_{n-1}+1 \pmod 4$ .
Take the first example, the simplest one. It's indeed stationary, and $\mu=E(X_n)=1/2$. But if you want to estimate $\mu \approx \frac{1}{N}(X_0 + X_1 +\cdots X_{N-1})$ for large $N$, as you would typically expect, you are busted.
And we usually want to be able to do that (assume that time averages tend, asymptotically, to the expected values). That's important, for example, for extending AEP to non-independent sources.