On page 277 in Elements of Information Theory, Second Edition by Cover & Thomas the derivation of the information capacity of a colored (Gaussian) noise channel is performed. While the math is clear to me, I have a hard time accepting the statement:
For channels with memory, we can consider a block of $n$ consecutive uses of the channel as $n$ channels in parallel with dependant noise.
While it seems true intuitively (the decoding rule does not care about whether data is arriving sequentially or at one instant in "time") my problem is as follows:
Consider a channel with time dependant noise i.e. $Y_k = f(X_k,N_k)$. Then, if the quote above was true, we would have $C = \max I(X_1,...,X_n;Y_1,...,Y_n)$ (as it is the case for instance on page 278, 9.81). But, by the definition of capacity, this means that we can now reliably transmit data with up to C bits per channel use for subsequent uses of the n parallel channels that were originally sequential. In other words, to obtain the result for the original channel (sequential), why are we not required to devide by $n$, yielding the result $C = \frac{\lim_{n \to \infty} I(X_1,...,X_n;Y_1,...;Y_n)}{n}$? (Note that this holds for the case of the original case with IID noise as for instance derived in chapter 7)
I hope my question is understandable, please provide feedback in case further elaboration is needed.
If I'm not mistaken, you are required to divide by $n$, if you want to interpret the parallel $n-$channel as a standard channel with memory, and you want to compute the capacity per (original) channel use. Otherwise, you get the capacity per $n-$block.