Fast way to get a position of combination (without repetitions)

6.1k Views Asked by At

This question has an inverse: (Fast way to) Get a combination given its position in (reverse-)lexicographic order


What would be the most efficient way to translate a combination of $k$-tuple into its positions within the $\left(\!\!\binom{n}{k}\!\!\right)$ combinations?
I need this to be fast for combinations of $\left(\!\!\binom{70}{7}\!\!\right)$ order of magnitude - very large, but not exceeding 2 billion (fits into int32 maximum value).

Below is an example of $\left(\!\!\binom{6}{3}\!\!\right)$, where the aim is to quickly translate (a, d, f) tuple to value 9, while in the real problem $k$ ranges between 5 and 8.

$$\begin{array} {cccccc|c} a&b&c&d&e&f&^{combination}/_{sort\_order}& \\\hline x&x&x& & & &1\\ x&x& &x& & &2\\ x&x& & &x& &3\\ x&x& & & &x&4\\ x& &x&x& & &5\\ x& &x& &x& &6\\ x& &x& & &x&7\\ x& & &x&x& &8\\ x& & &x& &x&9\\ x& & & &x&x&10\\ .&.&.&.&.&.&.\\ & & &x&x&x&20\\ \end{array}$$

I know that I could pre-calculate all the combinations and reverse the lookup dictionary. However, such dictionary would not be efficient in terms of memory usage. Therefore I am looking for either calculation-based approach, or a more efficient data structure to perform this mapping.

4

There are 4 best solutions below

2
On

I'll relabel your (a,d,f) to (1,4,6) and denote it by $(i_1,i_2,i_3)$ so we can calculate with it.

Start at index $\binom nk$. Moving the $k$-th entry to the left by $1$ reduces the index by $1=\binom{n-j}0$. Moving the $(k-1)$-th entry to the left from $j$ to $j-1$ reduces the index by $n-j=\binom{n-j}1$. Generally, moving the $(k-m)$-th entry to the left from $j$ to $j-1$ reduces the index by $\binom{n-j}m$. Thus the index is

$$ \binom nk-\sum_{m=0}^{k-1}\;\sum_{j=i_{k-m}+1}^{n-m}\binom{n-j}m\;. $$

You can easily precalculate the inner sums so that you can look them up using $m$ and $i_{k-m}$, so you just need to add up the $k$ terms in the sum over $m$ to get the index.

P.S.: That was unnecessarily complicated; the inner sum simplifies, and the result is

$$ \binom nk-\sum_{m=0}^{k-1}\binom{n-i_{k-m}}{m+1}\;. $$

You can probably derive that more directly, but since you're perhaps just interested in a practical result and not the most elegant way of deriving it, I'll leave it at that.

6
On

Your tuple ordering is lexicographic and your to-be-computed position is one-based, as are the symbol codes for $a,b,\ldots$ used in @joriki's answer; but for the sake of simplicity I will use reverse-lexicographic ordering and zero-based positions and letter codes. Conversion is done by replacing @joriki's $(i_1,\ldots,i_k)$ with $(n-i_k,\ldots,n-i_1)$ and replacing the position resulting from my formula with its distance to $\binom{n}{k}$. The result below is thus consistent with @joriki's formula.

I have used such computations for compression of multi-indices into (skew-)symmetric tensors; therefore I borrow some vocabulary from that domain.

Let us define a $k$-index to be a $k$-tuple of strictly increasing nonnegative integers. You may have to sort accordingly and to disallow duplicate entries.

$k$-indices can be totally ordered in a reverse-lexicographic manner: Sorting is done by the last element, in case of equality by the next-to-last element, and so on.

For a $k$-index $I$, let us define its position (or compressed index) $\operatorname{ordx}(I)$ as the number of $k$-indices that are reverse-lexicographically smaller than $I$.

Note that $\operatorname{ordx}(I) = 0$ for the smallest $k$-index $I=(0,\ldots,k-1)$.

We need a notation for truncated tuples. Denoting $I = (i_0,\ldots,i_{k-1})$, let $I_m = (i_0,\ldots,i_{k-m-1})$ for integer $m$ with $0\leq m<k$. That is $I$ with the last $m$ elements chopped off.

Now a $k$-index $J=(j_0,\ldots,j_{k-1})$ is reverse-lexicographically smaller than $I$ if and only if

  • $j_{k-1} < i_{k-1}$; there are $\binom{i_{k-1}}{k}$ such $k$-indices; or
  • $k>1$, and $j_{k-1} = i_{k-1}$, and $J_1$ is reverse-lexicographically smaller than $I_1$. This condition involves a comparison of $(k-1)$-indices.

Proceeding by induction, we arrive at the formula

$$\operatorname{ordx}(I) = \sum_{r=1}^k\binom{i_{r-1}}{r}$$

It is worth noting that this formula does not depend on the upper bound $n$ for the index elements.

The binomial coefficient values can be computed on the fly by initializing and updating a segment of Pascal's triangle. Define $$b_{j}^{(r)} = \binom{j+r}{r} = \begin{cases} 1 & \text{if $r=0$ or $j=0$} \\ b_{j}^{(r-1)} + b_{j-1}^{(r)} & \text{if $r>0$ and $j>0$} \end{cases}$$ So we just need to initialize and update an array $$B^{(r)} = \left(b_0^{(r)},\ldots,b_{i_{k-1}-k}^{(r)}\right)$$ In practice, we prepend a $0$ to that array in order to account for the case $i_{r-1} = r-1$ which requires a zero binomial coefficient.

In Python (which uses zero-based indices and half-open ranges):

def ordx(idx):
    """
    Turns a multi-index of strictly increasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    b = [0] + [1] * (idx[-1] + 1 - len(idx))    # [0, 1, 1, ...]
    for r,i in enumerate(idx):      # (0,idx[0]), (1,idx[1]), ...
        for j in xrange(2, len(b)): # 2, ..., len(b)-1
            b[j] += b[j-1]          # binomial(j+r, r+1)
        s += b[i - r]
    return s

Besides: If you want to allow duplicate tuple elements, you can transform that problem by adding to each element $i_r$ the sub-index $r$ and computing ordx for the modified tuple which now has strict increments. For that use case, the code above gets simplified a bit:

def ords(idx):
    """
    Turns a multi-index of nondecreasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    b = [0] + [1] * idx[-1]         # [0, 1, 1, ..., 1]
    for i in idx:
        for j in xrange(2, len(b)): # 2, ..., len(b)-1 = idx[-1]
            b[j] += b[j-1]
        s += b[i]
    return s

This computes $$\operatorname{ords}(I) = \sum_{r=0}^{k-1}\binom{i_r + r}{r+1}$$ Such a function ords could be used for compressing a sorted multi-index for totally symmetric tensors to a flat index that removes redundancy.

Update: The above algorithms are simple, but need to update an array of binomial coefficients for each index element. Consequently, running the above ords needs $k\,i_{k-1}$ additions and an array b of length $i_{k-1}+1$. For ordx replace $i_{k-1}$ with $i_{k-1}-k+1$. We can reduce the operation count and memory usage by computing each needed binomial coefficient directly from the previous one. This requires a sequence of multiplications and divisions instead of just additions, but it reduces binomial bookkeeping to one scalar variable and keeps total ords operation count at $\operatorname{O}(k+i_{k-1})$. Correspondingly, ordx operation count is $\operatorname{O}(i_{k-1})$. Here is a sample Python implementation of the optimized $\operatorname{ordx}$ (with // denoting integer division):

def ordx_opt(idx):
    """
    Turns a multi-index of strictly increasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    j = 1
    b = 1
    for r,i in enumerate(idx):  # (0,idx[0]), (1,idx[1]), ...
        if i == r: continue     # skipping terms with zero binomial coeff
        # b == binomial(j+r-1, r), update to j == i - r
        while j < i - r:
            b *= j + r
            b //= j
            j += 1
        # b == binomial(i-1, r), update to binomial(i, r+1)
        b *= i
        b //= r + 1
        s += b
    return s

And the corresponding optimized $\operatorname{ords}$:

def ords_opt(idx):
    """
    Turns a multi-index of nondecreasing nonnegative integers
    into a 1-dimensional zero-based index.
    """
    s = 0
    j = 1
    b = 1
    for r,i in enumerate(idx):
        if i == 0: continue     # skipping terms with zero binomial coeff
        # b == binomial(j+r-1, r), update to j == i
        while j < i:
            b *= j + r
            b //= j
            j += 1
        # b == binomial(i+r-1, r), update to binomial(i+r, r+1)
        b *= i + r
        b //= r + 1
        s += b
    return s
8
On

Let us denote your tuple [a b c] as [1 1 1 0 0 0] and so on.

Define $\binom{n}{r}=0$ for $n<r$

For your tuple: $[a d f] = [1 0 0 1 0 1]$ $$P = 1\cdot \binom{0}{1}+0\cdot \binom{1}{1}+0\cdot \binom{2}{1}+1\cdot \binom{3}{2}+0\cdot\binom{4}{2}+1\cdot\binom{5}{3} + 1$$ $$P=0 + 0 +0 +3+0+10+0+1 = 14$$

General Algorithm:

  • Calculate the position value of each binary digit using $\binom{n}{r}$
  • Take $n$ as position of the digit from left, for leftmost digit $n=0$.
  • Write $r$ for each position as the number of 'ONES' counted from left, including the one at current position.

Example-1: [a b c] = [1 1 1 0 0 0]

Calculate the position of the tuple as sum: $$P = 1\cdot \binom{0}{1}+1\cdot \binom{1}{2}+1\cdot \binom{2}{3}+0\cdot \binom{3}{3}+0\cdot\binom{4}{3}+0\cdot\binom{5}{3} + 1$$ $$P=0 + 0 +0 +0+0+0+0+1 = 1$$

Example-2: [d e f] = [0 0 0 1 1 1] $$P = 0\cdot \binom{0}{0}+0\cdot \binom{1}{0}+0\cdot \binom{2}{0}+1\cdot \binom{3}{1}+1\cdot\binom{4}{2}+1\cdot\binom{5}{3} + 1$$ $$S=0+0+0+3+6+10+1=20$$

The lone ONE is added because you are not starting at zero.

0
On

use this function very efficient, but read the comments first before using it to know how it works

def getCombRep(nValue, rValue, term):

    '''
    this function return a number whose binary representation equals nth combination given
    n , r, t = nValue, rValue, nthTerm or term
    view combination as selection of element from a set,
    element can either be selected or not
    so there is two options, selected and not selected
    these options can be represented in binary form, selected be 1 and not selected be 0
    consider this simple set of element [a, b, c, d, e, f, g, h, i, j]
    let us choose six element from this set, C(10,6) ---> C(n,r)
    if 11011011 is our nth selection in binary form, then selected set will be [c, d, f, g, i, j] 
    our binary selection will always contain number of 1s equal to r, r equals 6 in our example set
    the length of our binary selection is not equal to number of element in the set,
    append leading zeros until length equals number of element, 0011011011
    or start selection from right to left of the set
    let us follow the following steps in detail to reach our goal
    '''

    #calcutate the biggest nth combination, as limit
    num = math.factorial(nValue)
    den = math.factorial(nValue - rValue) * math.factorial(rValue)
    limit = num // den
    # check if the limit exceeded, if exceeded take the reminder
    reminder = term % (limit + 1)
    #since we are counting nth permutation from 1, if reminder is zero, increment to one
    if reminder == 0:
        reminder += 1
    # set term equals reminder
    term = reminder

    combList = []
    nthTerm = term

    '''
    STEP 1: CONVERT nthTERM GIVEN IN COMBINATION REPRESENTATION.
    -----------------------------------------------------------------------------------------------
    STEPS:
        1. let r be the number of element to be choosen, and n the number of element in the set
        2. find x in range 0 to n such that C(r+x,x) is equal to or is the first smallest from term given
           if no such C(r+x,x)  exist in this range go to step 6
        3. append, x + r and x as pair, use list, for our case combList is used
        4. calculate the difference btn term and C(r+x,x) calculated ie term - C(r+x,x)
        5. update the term to be the difference calculated
        6. drop the value of r by one ie r - 1
        7. repeat step 2, 3, 4, 5 and 6 until term equals zero
    so now our combination representation of nthterm will look like this
        x1Cr + x2Cr-1 + x3Cr-2 + ... + xnCo ( may not contain all terms)
    '''
    #looping all possible r in descending order(step 1 and 6)
    for r in range(rValue, 0, -1):
        #looping all possible x in ascending order (step 2)
        for x in range(nValue):
            #calculate xCr
            num = math.factorial(r + x)
            den = math.factorial(x) * math.factorial(r)
            ncr = num // den
            #test if xCr exceed the nth term 
            #if true, then the preious term is either first smallest of equals to nth term
            if(ncr > nthTerm):
                #calculate x-1Cr, previous term (step 2)
                num = math.factorial(r + x - 1)
                den = math.factorial(x -1) * math.factorial(r)
                ncr = num // den
                # values calculated 
                combList.append([r + x - 1, x - 1])
                #update term to be the difference
                nthTerm = nthTerm - ncr
                #break out of x loop for next step
                break
        # if in between the loop term drop to zero, then break out of r loop
        if nthTerm == 0:
            break
    '''
    STEP 2: CONVERT COMBINATION REPRESENTATION TO COMBINATION NUMBER
    ----------------------------------------------------------------
    here combination number is the number whose binary number represent nth combination
    this number contain exactly 1s equals to number of element to be selected from list given

    TERMS:
        1. ostep:: is outside step(o - step) that increases the binary digit by one, 
                   is the first C(n,r) value in the combination representation of the nth term
                eg 111111 ---> 1011111, 1111110--->10011111, number of digit increase by one
        2. istep:: is inside step (i - step) that shift 1s inside a binary number one step to the left of binary digit
                   are intermediate C(n,r) values in the combination representation
                   isteps occurs if there is more than 2 steps to make, and they are intermediate steps btn o and f steps
                eg 10011111 ---> 10101111, 10000111 --->10001011, number of digits remain the same
        3. svalue:: is stepping value and is the difference between first term of the current step 
                    and last term of previous step, if there is only one step to make, no svalue should be added,
                    because we are not stapping to the next step, so the final step always does not have svalue
                    eg 1011111 - 111111 = 95 - 63 = 32
        3. fstep:: this is equal to final step(f - step), all o and i steps can be fsteps
                   if only ostep exist, there is no i or f steps, even if there is only one step
                   fstep occurs if there is more than one step to make
                   and is the final step
    EXPlANATION:
        consider this simple set of element [a, b, c, d, e, f, g, h, i, j]
        Let us choose four element from a set, the last four element will be choosen first, 0000001111
        then 0000010111, 0000011011, 0000011101, 0000011110, 0000100111, and so one.
        all bit are shifted to the left before the next ostep occured
        can we engineer the pattern here?

        1                 1111         15
        2                10111         23
        3                11011         27
        4                11101         29
        5                11110         30
        6               100111         39
        7               101011         43
        8               101101         45
        9               101110         46

        without appending zeros

        take this point for simplicity:
        1. nth ostep occurs at ((2^r - 1)*(2^k)) + ((2^(r - 1)) + (2^k - 1))
           ((2^r - 1)*(2^k)) ---> is the last term of the previous ostep, from example are 1 and 5
           ((2^(r - 1)) + (2^k - 1)) ---> is svalue when added to last term, new ostep is formed
           where k is the nth step,  1 <= k <= n - r, when r = 4, and n = 10, there are 6 0steps
           k is increasing by one while r is decreasing by one
        2. nth istep occurs at ((2^(r - x) - 1) * (2^r - 1)) + ((2^(r - 1 - x)) + (2**r - 1)) + (constant ostep)
           ((2^(r - x) - 1) * (2^r - 1))--> last term of previous istep, eg 3 and 7 (check isteps)
           (2^r - 1)) + ((2^(r - 1 - x)) --> is svalue for moving to the next istep
           where k is the nth step,  1 <= k <= n - r - 1, when r = 4, and n = 10, there are 5 0steps
           k is increasing by one while r is decreasing by one
        3. if ther is only one step, this step is ostep, and if there is more than one, 
           the final step is fstep, if there are two steps no istep2 exist
        4. fstep is either ostep or istep but no steping value present, because it is the final step 

    RELATIONSHIP BETWEEN STEPS AND COMBINATION REPRESENTATION.

    We have combination representation in the list already, in form of [n, r], 
    the length of the list represent the number steps, ostep-->isteps-->fstep, to make
    only the r's are used, n's are like waste now
    r is always equals to k in our formulas 
    in every step formula there are two terms, let say a and g
        (2^r -1)*(2^k) + (2^(r - 1)) + (2^k - 1) --> (a) * (2^k) + (g) + (2^k -1)
        (2^(r - x) - 1) * (2^r - 1)) + ((2^(r - 1 - x)) + (2^r - 1)) + (constant ostep) 
        -->((a) * (2^r - 1)) + ((g) + (2^r - 1)) + (constant ostep)
    now you know a and g,
    in every step, g is halved and a get reduced by g ie a = a - g

    '''
    # variable to store the number we are going to calculate
    number = 0
    #looping all combination representation from the combList
    for x in range(len(combList)):
        #a g and r are evaluated here
        a, g, r = (2**(rValue - x) - 1), (2**(rValue - 1 -x)), combList[x][1]

        #this if the ostep as final step, svalue is omitted 
        if x == 0 and len(combList) == 1:
            p = (a * 2**r)
            number += p
        #this is the ostep with svalue
        if x == 0 and len(combList) > 1:
            p = (a * 2**r) + (g) + (2**r -1)
            number += p
        #this is the final step, no svalue
        if x == len(combList) - 1 and len(combList) > 1:
            p = (a * (2**r - 1))
            number += p
            #break out of loop coz it is the final step
            break

        #this is the istep
        if x > 0 and x < len(combList) - 1:
            p = (a * (2**r - 1)) + (g) + (2**r - 1)
            number += p
    return number