How can I find numerically nice-to-compute upper limits of nCr?

123 Views Asked by At

How can I find nicely behaving functions which are easy to compute and which fit well to the upper limits of the (2-logarithm) of the nCr function?

What I am interested in in practice is to be able to model "how many binary digits (bits) will a digital representation of nCr(a,b)" have? A guess of 1 or 2 too many bits are acceptable, but a guess of 1 or 2 too few bits are not. So the problem is a bit un-symmetric. Errors in one direction are catastrophically problematic (unable to represent the number) but errors in the other direction "only" cause an unnecessary extra percentage of bits stored.

Actually calculating factorials will not be considered easy enough. On the other hand, the simplest of approximations like $nCr(a,b) \leq 2^a$ will be considered too sloppy.


Own work what I have tried on my own is to calculate the number of bits of all numbers, then fit low order polynomials to the bit count, then rounded the estimation to closest integer. In the case of too few bits I have updated the constant term by as much as to give a 0.01 margin that every number gets rounded right.

If I do this, I will for

  • 2:nd degree polynomial get 2.82 % unnecessary bits.
  • 3:rd degree polynomial get 1.96% unnecessary bits.
  • 4:th degree polynomial get 1.423% unnecessary bits.
  • 5:th degree polynomial get 1.1356%
  • 6:th degree polynomial get 0.9258%
  • 7:th degree polynomial get 0.7823%

For higher order polynomials it seems I get numerical issues. Perhaps I need to apply regularization or come up with a smarter way than just adjusting the constant term when too few bits are predicted.

Here is a visualization of what I am trying to do. Y axis is the number $a$ as a binary number. Yellow means set bit (digit 1), blue means unset bit (digit 0). The red curve is the model. It is fit with regression using a polynomial model against the uppermost yellow a for every b. It should roughly be the same as a polynomial fit against 2-logarithm of $nCr(a,b)$.

enter image description here

The goal is to as closely as possible capture how many bits are required for each value of nCr.


Edit the lower portions of zeroes in the graph seem to be a numerical error due to me using floating point arithmetics and poor choice of calculations.


Update

For practical reasons I have for now settled for using the rather crude estimation $nCr(a,b) \leq 2^a$ capped to octets, i.e. $8\cdot \lceil a/8 \rceil$ bits assigned to every such number. It still helps shrink the required memory quite a bit.

Storing $0\leq a \leq 2048$ took over 1GB using a naiive table not utilizing triangular structure and symmetry - simply storing 2048 bits for every single number. With some smarter book-keeping I got it down to ~260 MB and with the mentioned octet-capping down to ~180 MB. It is sure that more memory can be saved by using for example LeonBloys estimate. But the computational costs in keeping track where all the bits are and to compress and extract them using bit-manipulation will probably affect performance too much. At least for my current applications.

3

There are 3 best solutions below

2
On BEST ANSWER

From this answer:

$$\binom{a}{b}\le \frac{1}{\sqrt{2 \pi }}\sqrt{\frac{1}{b}+\frac{1}{a-b}}\,\frac{a^{a} }{b^b\, (a-b)^{a-b} }$$

Hence $$\log_2 \binom{a}{b}\le a \log_2 a - b \log_2 b -(a-b) \log_2 (a-b) - \frac12 \log_2 \left(\frac{1}{b}+\frac{1}{a-b}\right)-1.32574$$

Or even tighter, using this:

$$\log_2 \binom{a}{b}\le \left(a+\frac12\right) \log_2 a - \left(b+\frac12\right) \log_2 b - \left(a-b+\frac12\right) \log_2 (a-b) +\\+ \left( \frac{1}{12 a} - \frac{1}{12 b +1} - \frac{1}{12(a-b)+1}\right)\log_2(e) -\frac12 \log_2(2 \pi)$$

0
On

While LeonBloys answer is both very accurate and mathematically elegant, for computational reasons I think it may be too expensive for my application. 2-logarithms are still quite expensive to calculate on modern CPUs and it would be cumbersome having to keep track of which bits belong where. I will therefore present a less elegant and accurate but easier-to-handle estimate.

Prelude : Proving that the nCr seen as a function $$t\to nCr(a,t),\hspace{1cm} t \in [0,a/2]$$ is bounded above by the tangent defined by it's derivative at 0 is impossible as the function is a function fro and to the integers and those are of course never differentiable.

If we want to, perhaps we can make such a proof instead using the various upper bounds functions presented. I suppose the convexity / concavity of the logarithms would play a vital role in such a proof.

I will without presenting such a proof assume this is self-evident.


Assuming an upper bound of the derivative will be $\lceil \log_2(a) \rceil$, a triangle can be constructed for which we can calculate the number of bits saved. It's sides will be $\frac{a}{\log_2(a)}$ and $a$ and it will be right angled so the area will be $\frac{a^2}{2\log_2(a)}$ due to the symmetricity of the Pascal's triangle with a naiive approach we would need to store $a^2/2$.

So the relative gain will be $1\over \log_2(a)$. Although for practical purposes we will get closer to $1\over {\lceil \log_2(a) \rceil}$. Well.. at any account it shall be over 9% memory saved for $a\leq 2048$ and even better for lower numbers.

enter image description here

In this answer the proposed method will save all the area above the gray line. While we can see that it will save a lot less bits than other approaches closer to the most significant bits, we will avoid having to make involving calculations in keeping track of where the bits are, they will follow a simple linear function which also is easy to index for a bit writer and reader.

0
On

Edit: I realized my answer is the same as LeonBloy's

By using Stirling approximation one can obtain very close value: $$\left(\begin{matrix}a\\b\end{matrix}\right) < \frac{1}{\sqrt{2\pi b(1-b/a)}}\cdot\frac{a^a}{b^b(a-b)^b}$$ Therefore $$\log_2\left(\begin{matrix}a\\b\end{matrix}\right) < a\log_2 a - b \log_2b - (a-b)log_2(a-b)-\frac{1}{2}\log(2\pi b(1-b/a))$$ Rearrange the terms: $$\log_2\left(\begin{matrix}a\\b\end{matrix}\right) < (a+1/2)\log_2 a - (b+1/2) \log_2b - (a-b+1/2)log_2(a-b)-\frac{1}{2}\log_2(2\pi)$$