I was following the textbook by Cover & Thomas (2006): Elements of Information Theory. (hyperlink is not owned by me)
I have one question that has been irking me for some time. It is regarding that achievability of the Capacity region for multiple access channels (p530-531):
$\mathbf{X_1, X_2}$ are two i.i.d. random $n-$sequences (codes) drawn from a joint distribution $p(x_1, x_2)=p_1(x_1)p_2(x_2)$. These sequences correspond to two indepepndent information sources. These sequences are mapped to an output sequence $\mathbf{Y}$ by a discrete memoryless channel have transition probability mass function $p(y|x_1, x_2)$. Let $A_\epsilon^{(n)}$ the set of jointly-typical $\mathbf{(x_1, x_2, y)}$ sequences. The senders map their input alphabet set $\{1,2,...,2^{nR_1}\}\times\{1,2,...,2^{nR_2}\}$ to the code sequence pair $\mathbf{X_1}(i),\mathbf{X_2}(j)$. The decoder selects that pair $(i, j)$ for which $$(\mathbf{x_1}(i), \mathbf{x_2}(j), \mathbf{y}) \in A_\epsilon^{(n)}.$$ The goal is to bound the probability of error $P_e^{(n)}$ using this typical decoding strategy and eventually to show that it goes to zero as the blocklength $n\rightarrow \infty$.
To this end, define the following event:$$E_{ij}=\{(\mathbf{X_1}(i), \mathbf{X_2}(j), \mathbf{Y}) \in A_\epsilon^{(n)}\}$$
Suppose that the index $(1,1)$ was transmitted. The probability of error in this case is union bounded as$$P_e^{(n)} \leq P(E_{11}^c) + \sum_{i\neq1, j=1}P(E_{i1}) +\sum_{i=1, j \neq 1}P(E_{1j}) + \sum_{i\neq1, j\neq1}P(E_{ij}) $$ Consider the second term, here is the problem:
$$P(E_{i1})=P\{(\mathbf{X_1}(i), \mathbf{X_2}(1), \mathbf{Y}) \in A_\epsilon^{(n)}\}=\sum_{(\mathbf{x_1},\mathbf{x_2},\mathbf{y})\in A_\epsilon^{(n)}}p(\mathbf{x_1})p(\mathbf{x_2},\mathbf{y})$$
This last factorization of the joint PDF is something i do not understand. The fact that the sequences $\mathbf{X_1}(i)$ and $\mathbf{X_2}(1)$ are independent is certainly not enough to justify it. I would appreciate if someone can shed a little light on it.
I assume that you have diggested the proof of the Channel Coding Theorem (7.7 ; esp. p. 203). This is analogous, just a bit more complex.
In the example, the "good" set of inputs and outputs is the tuple $(\mathbf{X_1}(1), \mathbf{X_2}(1), \mathbf{Y})$, and (in the case of your doubt) we are interested in the event that some "bad" tuple of the form $(\mathbf{X_1}(i), \mathbf{X_2}(1), \mathbf{Y})$ ($i\ne 1$) belongs to the typical set. Now, that tuple correspond to the "true" input for $\mathbf{X_2}$ and the true output $Y$, but a "false" input from the codebook ok $\mathbf{X_1}$. Then, $\mathbf{X_2}(1)$ and $\mathbf{Y}$ are related, but $\mathbf{X_1}(i)$ is independent from both. Hence the probability of the tuple factors as $p(\mathbf{x_1})p(\mathbf{x_2},\mathbf{y})$