I am not a mathematician and struggling with the exercises while reading this book
Information Theory, Inference and Learning Algorithms.
The author introduced the binary entropy function at the start of the book as
$$ H \equiv x \log \frac{1}{x} + (1-x) \log \frac{1}{(1-x)} $$
He shows it can be used to approximate
$$ \log \binom{N}{r} \simeq NH(\frac{r}{N}) $$
I am puzzled how he got to this approximation. Please could someone explain why?
It is most likely that the author uses Stirling's approximation to the logarithm of the factorial function as described here, where we use
$$\log n!\simeq n\log n - n+ O(\log n)$$ Thus, omitting the $O(\log)$ term on the assumption that it is negligible relative to the other terms, the approximation becomes (the blue terms below which sum to zero are added to obtain the desired result) $$\log {N \choose r}=\log\frac{N!}{r!(N-r)!}\simeq N\log N-N-r\log r+r-(N-r)\log(N-r)+(N-r)\\=N\log N-N-r\log r+r-(N-r)\log(N-r)+(N-r)\color{blue}{+r\log N-r\log N}\\=r\log\frac{N}{r}+(N-r)\log\frac{N}{N-r}\\=N\frac{r}{N}\log\frac{N}{r}+N\frac{N-r}{N}\log\frac{N}{N-r}\\=N\left(\frac{r}{N}\log\frac{1}{\frac{r}{N}}+\left(1-\frac{r}{N}\right)\log\frac{1}{1-\frac{r}{N}}\right)\\=NH\left(\frac{r}{N}\right)$$