I'm pretty new in information theory so I need your suggestions or references in order to solve this problem. I have my attempt below.
Problem:
Every second, Alice can either send (bit 1) or not send (bit 0) a photon through a fiber optic cable. When Alice sends a photon, there is a $p=1/8$ chance that it does not reach Bob. Also, when she doesn't send a photon, there is a $p=1/8$ chance that Bob detects a photon anyway. Given $n=1000$ seconds to transmit her message, roughly how many different messages can Alice send Bob with only negligible risk of confusing one message for another?
So, according to me, the problems basically asks for the capacity $C$ of the channel. Thus the answer would be $2^C$. But I'm having trouble calculating $C$.
My attempt.
Let $X=\{x^1,...,x^n\}$ be the input (with $n=1000$), and $Y=\{y^1,...y^n\}$ the ouput of the channel, where each $x^k\in \{0,1\}$ and each $y^k\in\{0,1\}$. So if I'm not wrong the capacity of the channel is the mutual information between the input and the output:
\begin{equation} \tag{1} C\equiv I(X:Y):=I(X)+I(Y)-I(XY), \end{equation}
being $I(X)$ the marginal entropy and $I(XY)$ the joint entropy, defined as:
\begin{equation} \tag{2} I(X):=-\sum_iP(X_i)\log_2P(X_i) \end{equation}
where the sum is over the $2^n$ possible different vectors $X$ (so $P(X_i)\equiv 2^{-n}$, and $I(X)=I(Y)=n$).
and
\begin{equation} \tag{3} I(XY):=-\sum_{i,j}P(X_iY_j)\log_2P(X_iY_j) \end{equation}
where there is a double sum over the $2^n$ possible different vectors $X$ and $Y$. $P(X_iY_j)$ is the joint probability of the vectors $X_i=\{x_i^1,...,x_i^n\}$ and $Y_j=\{y_j^1,...,y_j^n\}$. Since each $y^k$ depends only on the corresponding $x^k$, I can write $P(X_iY_j)=P(Y_jX_i)\equiv P(Y_j|X_i)P(X_i)\equiv P(X_i)\prod_{k=1}^n P(Y_j^k|X_i^k)$. The last equality holds because the $Y^k$'s are independent. My problem is that when I try to calculate (3) I get a really small number: $I(X_iY_j)\sim10^{-77}$, so $I(X:Y)=2n$, practically. But I know that this answer is not correct.
Questions: What am I doing wrong? Or is there another way of solving the problem?
This is a channel without memory, its capacity (per channel use - here, per bit sent) can be readily computed, it's actually the classical BSC.
$$I(X;Y) = H(Y)-H(Y|X)=H(Y)-h(p)$$ where $h(p)$ is the binary entropy function. Hence
$$C=\max_{p(X)} I(X;Y)=1-h(p)$$
In our case $C=1-h(0.8)\approx 0.4564$ (bits/channel use) This means (assuming we know the second theorem of Shannon) that if we use the channel 1000 times can transmit up to $0.4564\times1000 \approx 456$ bits with low probability of error.