Firstly, I hope this is the right place to ask information theoretic questions that are mainly related to the mathematics.
In this paper by Tishby, Pereira and Bialek they mention on page 4 in the Relevant quantization chapter the setting is the following; Given some signal space $X \sim p(x)$ and a quantized codebook $\hat{X}$. They seek a possibly stochastic mapping characterized by the pdf $p(\hat{x}|x)$ from every value $x \in X$ to a codeword $\hat{x} \in \hat{X}$.
This has made me wonder about the following:
1) They mention that the average volume of the elements of $X$ that are mapped to the same codeword is $2^{H(X|\hat{X})}$ - why is this? I imagine it has something to do with the asymptotic equipartition property, however i can't quite connect them. $H(X|\hat{X})$ is the conditional entropy $$H(X|\hat{X}) = -\sum_{x \in X} p(x) \sum_{\hat{x} \in \hat{X}} p(\hat{x} | x) \log p(\hat{x} | x)$$
2) They mention that for ease of exposition both $X$ and $\hat{X}$ are finite, so how come $p(\hat{x}|x)$ is a p.d.f and not a p.m.f? Is this just a mistake?
1) Plug in the expression of conditional entropy in terms of joint entropy: $H(X | \hat{X}) = H(X, \hat{X}) - H(X)$.
Then, the volume expression becomes $2^{H(X | \hat{X})} = \frac{2^{H(X, \hat{X})}}{2^{H(X)}}$. This is the number of jointly typical pairs $(x, \hat{x})$ per typical outcome $x$, so it can be interpreted as the average volume a typical $x$ takes in the jointly typical set that this mapping generates.
I'm going to guess this is a well-known idea in their domain and the authors assume their reader is already exposed to this because it deserves more than a brief mention in a passing sentence.
2) I have seen pdf being used in place of pmf numerous times, so I wouldn't worry about it. Let me know in a comment if you encounter a contradiction later in the paper that arises from the pdf - pmf distinction and we can try to work it out.