I wish to write a C program through which I want to learn about Fountain Codes. The intention is to learn encode specific data using fountain codes and decode later. This is solely to understand and appreciate fountain codes.
I found the paper published by Prof. MacKay which describes the implementation and all other important information abut Fountain Codes. However, I find it too difficult to implement what is described in section 5.1 (Encoding) and 5.2 (Decoding) as they are highly mathematical. As a beginner to information theory, the difficulty is a higher. You may not need to refer the above publication in order to answer this question if you are already aware how LT Code implement the concept of Fountain Codes
Despite my tireless efforts with this publication, I would really appreciate if someone could illustrate (step by step) a very simple example of a fountain encoding/decoding process (using LT code). Perhaps we could consider a data packet of 16 bits 1101010101010101 which needs to be transmitted using this method. Any other information can be assumed.
The idea behind this question is to learn how a simple example of how Fountain Codes can be generated from a data packet which can later be decoded in step by step manner.
Truly appreciate your valuable time.
As Sec. 5.1 of MacKay's paper states, the encoder works by partitioning the data (information) into $K$ equal size components $s_1, s_2, \ldots, s_K$, and generates the coded symbols (to be transmitted) as a linear combination of these. For a toy example of the data being the bit sequence
101, one could consider $K=3$ resulting in the partition $s_1=1$, $s_2=0$, $s_3=1$. Then, the sequence $t_1,t_2,t_3,\ldots$ is transmitted, with symbol $t_n,n\in\mathbb{N},$ generated as $$ t_n = \sum_{k=1}^K g_{n,k}s_k, $$ where $g_{n,k} \in \{0,1\}$ and all operations are performed over the binary field. Note that $t_n \in \{0,1\}$. Sequence $\{g_{n,k}\}_{k=1}^K$ is constructed by randomly assigning the value of $1$ to $d_n\leq K$ elements and the others set to $0$. The value of $d_n$ is drawn according to a probability density function $\rho(d)$.Now, the transmission is performed over the so-called binary erasure channel (BEC), where a bit is either received perfectly or lost (erasure event). The erasure event happens with a probability less than $1$, independently of the transmitted symbol. This means that the receiver will probably receive a "corrupted" version of the transmitted sequence, e.g., $t_1, t_3, t_4, t_6, t_9,\ldots$.
However, the beauty of the Fountain codes is that as long as $K'>K$ symbols are received (no matter which they are), for some $K'$ very close to $K$, the receiver will be able to extract the data with high probability.
For the above example, say the transmitter gets $t_1, t_3, t_4, t_6$, which were generated by the transmitter as $$ \begin{align} t_1 &= s_1\\ t_3 &=s_1+s_2+s_3\\ t_4 &=s_2+s_3\\ t_6 &=s_1 +s_2 \end{align} $$
The receiver also knows how the symbols were encoded, hence it can construct the corresponding, so-called, factor graph, which, for this case, is the one shown in Fig. 4a of MacKay's paper. By following the decoding procedure discussed in the paper (referred to as belief propagation or sum-product algorithm in the literature), the receiver is able to recover the bit sequence
101.When the receiver has extracted the data it can inform the transmitter so that no further $t_n$ are transmitted. Note that, by this procedure, the classical notion of "code rate" is irrelevant for the fountain code as it is able to reliably transmit over any BEC, irrespective of its erasure probability. In contrast, "conventional" codes need to know the value of the erasure probability so as to decide on the coding rate. For this reason Fountain (and related) codes are also referred to as rateless codes.
P.S.: Note that implementing a Fountain code is not a simple task. I would recommend studying implementations already available on-line, and proceed to implementation only when you feel comfortable that you have understand all the aspects.