Well, I'm having this in a textbook [1] :
\begin{align*} H(U_N) &= H_1 + H_2 \\ \log_2 N &= H_1 + \sum_{k=1}^n \frac{N_k}{N} \log_2 N_k \\ \sum_{k=1}^n \frac{N_k}{N}\log_2 N &= H_1 + \sum_{k=1}^n \frac{N_k}{N} \log_2 N_k \\ \Rightarrow H_1 &= - \sum_{k=1}^N \frac{N_k}{N} \log_2 \frac{N_k}{N} \end{align*}
Which is supposed to show Shannon's formula.
However, I just don't get the step from line 1 to line 2. I tried to get there myself but it actually looks kind of wrong. I don't want to disgrace myself here but how does this make sense? Also notice the upper limit of the sum changing from $n$ to $N$ which is also rather weird. $n$ should be the number of groups, if I got that right and $N$ should be the number oh how many different numbers one can represent in base 2 using a certain number of bits as in:
How many times do I have to ask at most to guess a binary digit?
\begin{align*} n &= \log_2 2^n = \log_2 N \end{align*}
so we have
\begin{align*} H(U_N) = \log_2 N \text{ bit} \end{align*}
to store a corresponding message.
Also we're demanding that
\begin{align*} H(U_{N \cdot M}) = H(U_N) + H(U_M) \end{align*}
The only possibility for
\begin{align*} \sum_{k=1}^n \frac{N_k}{N} \log_2 N &= \log_2 N \end{align*}
is if
\begin{align*} \frac{1}{N} \sum_{k=1}^n N_k &= 1 \end{align*}
but would only work if we actually have something like
\begin{align*} \frac{1}{N} \sum_{k=1}^n w_k N &= 1 \Rightarrow \sum_{k=1}^n w_k = 1 \end{align*}
or
\begin{align*} \sum_{k=1}^n N_k &= N \end{align*}
But from what I saw I expected each $N_k$ to be $2^\text{something}$.
[1] Elements of Information Theory, Thomas M. Cover, Joy A. Thomas
Okay, I think I get it now. We have $\log_2(2^n) = \log_2(N) = n$ as the answer to the question
We define Hartley's function $H(U_N)$ where $U_N$ is a set with $N$ different elements with equal probability. Thus, the required amount of information to identify an element in $U_N$ is
\begin{align*} H(U_N) &= \log_2 N \text{ bit} \end{align*}
We also require
\begin{align*} \text{1. }&H(U_2) = 1 \\ \text{2. }&H(U_N) \leq H(U_{N+1}) \\ \text{3. }&H(U_{N\cdot M}) = H(U_N) + H(U_M) \end{align*}
Example:
If we want to know an element in $N = 2^3$ different elements, we can e.g. group them in $G_1 = \{000,001,010,011\}$ and $G_2 = \{100,101,110,111\}$ (Note that $N = |G_1| + |G_2| = N_1 + N_2)$. We already see, that we need 3 bits to encode all these elements. If we decide e.g. the first bit as identifier for one of the two groups we have:
and we can calculate this with
\begin{align*} H(U_{2 \cdot 4}) &= H(U_{2}) + H(U_{4}) \\ &= \log_2(2) + \log_2(4) \\ &= 3 \end{align*}
Now we want to know the following: If we take such a $H(U_N)$ that we group in the described manner as above we can "split" the question "Which element in the set $G_1 \cup G_2$ are you?" by asking the question $H_1$ ("in which group do you belong?") and $H_2$ ("Knowing the group, which element in that group are you?").
\begin{align*} H(U_N) = H_1 + H_2 \end{align*}
For $H_2$ we know that we need $\log_2 N_k$ questions to identify an object in the group. The average number of questions with respect to a group depends on the size of the group and thus we have
\begin{align*} H_2 &= \sum_{k=1}^n \frac{N_k}{N} \log_2 N_k \end{align*}
What's now missing is $H_1$. We want to know the missing part of information to calculate $H(U_N)$ or in other words: Not knowing $H_1$, how much information would I gain if I knew it?
\begin{align*} H(U_N) &= H_1 + H_2 \\ \log_2 N &= H_1 + H_2 \\ \log_2 N &= H_1 + \sum_{k=1}^n \frac{N_k}{N} \log_2 N_k \\ \Rightarrow H_1 &= \log_2 N - \sum_{k=1}^n \frac{N_k}{N} \log_2 N_k \\ &= \sum_{k=1}^n \frac{N_k}{N} \log_2 N - \sum_{k=1}^n \frac{N_k}{N} \log_2 N_k & \text{Because: } \sum_{k=1}^n \frac{N_k}{N} = 1 \\ &= \sum_{k=1}^n \frac{N_k}{N} \Big( \log_2 N - \log_2 N_k \Big) \\ &= -\sum_{k=1}^n \frac{N_k}{N} \Big( \log_2 N_k - \log_2 N \Big) \\ &= -\sum_{k=1}^n \frac{N_k}{N} \log_2 \frac{N_k}{N} \end{align*}
We now know that
\begin{align*} H_1 &= - \sum_{k=1}^n p_k \log_2 p_k \end{align*}
where $p_k = \frac{N_k}{N}$ which is the amount of information we would gain if we knew in which group our element falls into.
The reason why I didn't get it was actually just because I didn't see that $\sum_{k=1}^n N_k = N$.