I have been reading "Elements of Information Theory"- Cover & Thomas.
It states the following:
Definition: A discrete channel, denoted by $(\mathscr{X}, p(y \mid x), \mathscr{Y})$, consists of two finite sets $\mathscr{X}$ and $\mathscr{Y}$ and a collection of probability mass functions $p(y \mid x)$, one for each $x \in \mathscr{X}$, such that for every $x$ and $y, p(y \mid x) \geq 0$, and for every $x, \Sigma_{y} p(y \mid x)=1$, with the interpretation that $X$ is the input and $Y$ is the output of the channel.
Then later on, it derives the capacity of a discrete memoryless channel.
My questions:
- Does the results hold if the input alphabet is "infinite" in size?
- Is there a textbook which talks about the discrete channels with infinite input alphabet size?
For continuous alphabets you use differential entropy and obtain results based on that formulation. The flavour is somewhat different (sums replaced by integrals, differential entropy need not be nonnegative) but related.
Cover and Thomas, Chapter 9 "differential entropy" and Chapter 10 "the Gaussian channel" give a good introduction to these topics.
Edit: Most alphabets of interest are finite even in the most sophisticated texts (Csizar and Korner); one can talk about countable RVs and their entropy, but then you restrict it to some interval $[-M,M]$ or $[-M,M]^d$ if the goal is to actually investigate practical schemes. Also, continuous variables are quantized before transmission, yielding finite signal sets.
PS: There may be work out there that I am not aware of. Certainly not at a textbook level, I'd think.