Find a function $f_m $ that enumerates all subsets of $\{1,..,n\}$ of cardinality $m$ in ascending order

51 Views Asked by At

I'm Looking for a function
$f_m: \{1,..,n\} \to \{(M\subseteq\{1,...,n\}: |M|=m\} $
that enumerates all subsets of $\{1,..,n\}$ of cardinality $m$ in ascending order.

For example, for the set $\{1,2,3,4\}$ we'd have:
$f_3(1)=\{ 1,2,3 \}, f_3(2)= \{ 1,2,4 \}, f_3(3)= \{ 1,3,4 \}$

Algorithmically, we could generate $f_m$ for a set $\{1,..,n\}$ by putting all subsets $M\subseteq \{1,..,n\}$ with $|M|=m$ in a list $L$, then sorting it in ascending order (where a < b exactly iff for the first element a[i] with
a[i] != b[i] the inequality a[i] < b[i] holds), and we'd find that $L[i] = f_m(i)$, where $L[i]$ is the i-th element of the list (starting by 1).

Is there any mathematical definition of $f$ that can be calculated in polynomial time?


Okay, so far I think I've found the inverse of $f_m$:

Let $f_m: \{1,..,n\} \to \{(M\subseteq\{1,...,n\}: |M|=m\} $ be defined as above.

Then we have $$f_m^{-1}\{a_1,...,a_k\} = 1+ \sum_{j=1}^{k-1} \sum_{i=1}^{a_j-a_{j-1}-1}\binom{n-a_{j-1}-i}{k-j}$$, where we define $a_0 :=0$.

As we already know that $f_m$ is bijective (otherwise it wouldn't be an enumeration), we know therefore, that for any $m\in \{1,...,\binom{n}{k}\}$ there is a unique solution of $$m = 1+ \sum_{j=1}^{k-1} \sum_{i=1}^{a_j-a_{j-1}-1}\binom{n-a_{j-1}-i}{k-j}$$ for $a_1,...,a_n$, where for every $a_i$ holds $a_i\in\{1,...,n\} $ and $a_1<...<a_n$.

1

There are 1 best solutions below

1
On BEST ANSWER

I'll introduce a slight change of notation that will hopefully make things easier to understand. First, for later recursions, I'll put the $n$ into the function definition as well. Second, instead of having the co-domain be sets of size $m$, I'll have it be $m$-tuples in increasing order. It should be clear that this is an equivalent formulation: $$f^{n,m}: \left\{1,..,{n \choose m}\right\} \to \{(e_1, \ldots, e_m): e_1 < \ldots < e_m, e_i \in \{1,\ldots,n\} \forall i=1,\ldots,m\}$$.

The $i$-th component ($1 \le i \le m$) of the function value would then be denoted as $f_i^{n,m}$. So in the problem's example $f_1^{4,3}(1)=1, f_3^{4,3}(2)=4$, etc.

To tackle the problem, it makes sense to apply recursion, both accessing smaller $n$ and smaller $m$. To start the recursion, we note that

$$ f_1^{n,1} (x) = x, \forall x \in \{1,\ldots,n\}$$

completely solves the $m=1$ case for all $n$, and that this is also all there is for $n=1$.

For any $n,m$ with $m > 1$ there are exactly $n-1 \choose m-1$ tuples starting with $1$, exactly $n-2 \choose m-1$ tuples starting with $2$, a.s.o. until there are exactly ${m-1 \choose m-1} = 1$ tuples starting with $n-m+1$.

So to dermine $f_1^{n,m}(x)$, do something like (pseudocode)

$$r=1$$ $$y=x$$ $$\text{loop: }y=y-{n-r \choose m-1}$$ $$\text{if } y > 0: r=r+1, \text {goto loop}$$ $$y = y + {n-r \choose m-1}$$

and we have $f_1^{n,m}(x)=r$ and get a value $y$ that we need later.

Once we've found $f_1^{n,m}(x)$, finding the remaining components is a matter of applying recursion. Since our first component is $r$, the remaining $m-1$ components must come from the set $\{r+1,\ldots,n\}$. This is isomorphic to asking for increasing $m-1$ tuples in $\{1,\ldots,n-r\}$, so we get

$$f_i^{n,m}(x) = f_{i-1}^{n-r,m-1}(?) + r, \quad \forall i=2,\ldots,m$$

The "$+r$" comes from the fact that recursion works on $\{1,\ldots,n-r\}$, the shift in $i$ is that we already know the 1st component of $f^{n,m}$ and the second, third,$\ldots$ component in it is the first,second,$\ldots$ component from the recursion.

The only mystery is the argument for which we should call $f_{i-1}^{n-r,m-1}$. It can't possibly be $x$ in all cases, as that may be to big for the domain of the function. That's the $y$-value we calculated. It describes what is 'left' of $x$ after we jumped over all the arguments that map to a first component that is less than $r$. So we finally have

$$f_i^{n,m}(x) = f_{i-1}^{n-r,m-1}(y) + r, \quad \forall i=2,\ldots,m$$

++++

As for the running time of this algorithm, first let's find a suitable input size description. For $n$ that would be $\log n$, as usual. As the output requires $m$ values, the algorithm by necessity has a time that is at least linear in $m$, so input size would be something like $\max(\log n, m)$.

The most time consuming part of the algorithm is calculating the binomial coefficients. Since the lower number of them is always $\le m$, and the upper number always $\le n$, this boils down to calculating at most $2m$ products of numbers less than $n$ and one final division. That is something that is polynomial in $\log n$ and $m$.

The other part of the algorithms are a loop that runs at most $m$ times and a recursion that the goes at most $m$ levels deep. The remaining parts are simple operations and comparisons.

So this algorithm works indeed on polynomial time with $\max(\log n, m)$ as input size.