1) Suppose a Noiseless Binary Channel, who's input is reproduced exactly at the output. Let X be the transmitter and Y the receiver (i.e (X=0----->Y=0 and X=1-----> Y= 1))
I understand intuitively that the Information Channel Capacity is 1 bit, because any transmitted bit is received without error, hence one error free bit can be transmitted per use of channel.
And I understand almost completely from the math as to why it is 1.
I know that C=Max I(X;Y)= H(Y)-H(Y|X), where the conditional entropy is defined as H(Y|X)= Σ p(x)H(Y|X=x)
I understand that H(Y|X)=0 because If I know X therefore I know Y( in this case), hence there is no uncertainty. naturally C=Max I(X;Y)= H(Y)-H(Y|X), becomes C=Max I(X;Y)= H(Y) then H(Y) is max if it has a uniform distribution, which happens when p(x) is also uniform distribution, therefore, C= H(0.5,0.5)=1 bit
CONFUSION
when the Lecture video talks about the channel Capacity it does the following, C=Max I(X;Y)= H(Y)-H(Y|X), then it states that H(Y|X)= Σ p(x)H(Y|X=x) and that H(Y|X=x) is essentially H(1,0) ( I understand that H(1,0) =0) and the rest is the same as above, the part I am confused on is why does the notation change from conditional to Joint as well as why and how is one able to say that H(Y|X=x) is essentially H(1,0).
The notation being used is the following:
For any $0\leq p\leq 1$,
\begin{align} H(p,1-p)&= p\log\left(\frac{1}{p}\right)+(1-p)\log\left(\frac{1}{(1-p)}\right)\\ &=-\left(p\log(p)+(1-p)\log(1-p)\right) \end{align}
represents the binary entropy function, and not the joint entropy of $X$ and $Y$. The convention is that $0\log(0)=0$.
In this case, given $X=x$, $Y=x$ with probability $1$, and $Y=1-x$ with probability $0$. Thus, $p=1$ and $1-p=0$.