Bit encoding of a Set

430 Views Asked by At

Task:

Given a set $S$, provide a bit encoding of subsets $S'$ of $S$, i.e. $S' \subseteq S$ with the following two properties:

  • There is a unique encoding for every subset of $S$.
  • The endcoding is optimal (i.e. it needs the minimal amount of bits).

    Idea:

    let $X$ be the set $S = \{s_1, \dots, s_n\} $, but ordered in some way (Alphabetically, numeric,...).

    We then have the ordered version of $S$, $X = \{x_1, \dots, x_n |\quad x_i \in S, i\in 1,\dots,n\}$.

    We now encode every element of $X$, by encoding the index $i$ of each element $x_i \in X$. Each element $x_i$ is encoded using elias gamma encoding, which is defined as follows:

    $\lfloor log_2(i)\rfloor$ zeros, then i in binary.

    Example:

    Given $X = \{x_1, x_2, x_3, x_4, x_5\}$

    Then we encode each $x_i \in X$ by encoding $i$, Such that:

    i=1 -> 1
    i=2 -> 0 10
    i=3 -> 0 11
    i=4 -> 00 100
    i=5 -> 00 101
    

    And we end up with a set $X' = \{x_1=1, x_2=010, x_3=011, x_4=00100, x_5=00101 \}$

    If we then want to encode a subset $S' \subseteq S$, we just have to take the bit encodings from $X'$ and put them into the subset.

    Could this idea work or is it the wrong approach?

    EDIT:

    So the solution of the task is the following:

    Let $S$ be a set and $|S| = n$.

    Let $s_i$ be an element of $S$, i.e. $s_i \in S$, where $i \in [1,n], n \in \mathbb{N}$.

    A subset $S' \subseteq S$ has an encoding $enc(S')$. $enc(S')$ consists of $n$ bits $b_i$, such that: $$enc(S') = b_1\dots b_n$$ where a bit $b_i$ is set to 1 if $s_i \in S'$, and set to 0 otherwise.

    Example:

    Let $S = \{A,B,C\}$, then all subsets of $S$ are: $$\{\},\{A\},\{B\},\{C\},\{A,B\},\{A,C\},\{B,C\},\{A,B,C\}.$$ It follows that $S$ and all subsets are encoded as follows:

    $$enc(\{A,B,C\}) = 111$$ $$enc(\{A,B\}) = 110$$ $$enc(\{A,C\}) = 101$$ $$enc(\{B,C\}) = 011$$ $$enc(\{A\}) = 100$$ $$enc(\{B\}) =010$$ $$enc(\{C\}) = 001$$ $$enc(\{\emptyset\}) = 000$$

    In order to show that the code is optimal, we have to show that the number of subsets of $S$ is $2^n$, if so it can not be encoded with anything less than n bits.

    Prove:

    Let $S$ be a set and $|S| = n$.

    Let $\mathcal{P}(S)$ be the powerset of $S$.

    We show that $|\mathcal{P}(S)| = 2^n$.

    1. $|\mathcal{P}(S)| = \sum\limits_{k=0}^{n} \binom{n}{k}$. Where $\binom{n}{k}$ is the binomial coefficient, which denotes the amount of possibilities to draw $k$ elements from $n$ elements, where the order is ignored and every element can only be drawn once.
    2. The binomial theorem says: $$(x + y)^n = \sum\limits_{k=0}^{n} \binom{n}{k} \cdot x^{n-k} \cdot y^{k}$$
    3. Using 2. we come to the conclusion that: $$2^n = (1+1)^n \overset{2.}{=} \sum\limits_{k=0}^{n} \binom{n}{k} \cdot 1^{n-k} \cdot 1^{k} = \sum\limits_{k=0}^{n} \binom{n}{k} \overset{1.}{=} |\mathcal{P}(S)| $$
  • 1

    There are 1 best solutions below

    12
    On

    Hint: for each element $x \in S$ it's either included in a subset or it's not. How can you encode such a binary decision between two options?

    Alternative hint: how many subsets are there of a set with $n$ elements?