What is the time complexity when uniformly sampling $b$ entries without replacement from $n$ entries?

764 Views Asked by At

Given $n$ entries, I uniformly sample $b$ entries without replacement among them.

My question is that I want a very fast algorithm to do such things, and what is its time complexity? Is the constant in the time complexity $O()$ big?

1

There are 1 best solutions below

0
On

I'm sure your question has been thoroughly researched and published (did you google it?). I haven't personally needed exactly what you're asking for, but I have needed random permutations of $1\ldots n$, for which I wrote the following little function, based on googling the algorithm it implements...

#include <stdio.h>
#include <stdlib.h>
/* ==========================================================================
 * Function:    int permutation ( int seed, int n, int *p )
 * Purpose:     Returns p[0]...p[n-1] containing 1...n arranged randomly
 * --------------------------------------------------------------------------
 * Arguments:   seed (I)    int optionally containing seed for srandom(),
 *                          if 0 or negative, srandom() not called
 *              n (I)       int containing rank of permutation
 *              p (O)       &int returning 1...n arranged randomly
 * Returns:     ( int )     n=success, 0=failure
 * --------------------------------------------------------------------------
 * Notes:     o
 * ======================================================================= */
/* --- entry point --- */
int permutation ( int seed, int n, int *p ) {
  /* --- Allocations and Declarations --- */
  int   i=0;                /* p[] index */
  /* --- Initialization --- */
  if ( n<1 || p==NULL ) { n=0; goto end_of_job; } /* input error */
  if ( seed > 0 ) srandom(seed);    /* initialize random() */
  for ( i=0; i<n; i++ ) p[i] = i+1; /* initialize p[0...n-1] = 1...n */
  /* ---
   * Generate random permutation
   * ------------------------------ */
  for ( i=0; i<n-1; i++ ) {
    int j = i + ((random())%(n-i)); /* random i <= j <= n-1 */
    int pi = p[i];          /* swap p[i] <--> p[j] */
    p[i] = p[j];            /* p[i] = p[j] */
    p[j] = pi; }            /* p[j] = p[i] */
  end_of_job:
    return ( n );           /* back to caller */
  } /* --- end-of-function permutation() --- */

/* ---
 * Test Driver
 * -------------- */
#if defined(TESTDRIVE)
int main ( int argc, char *argv[] ) {
  int   n    = (argc>1? atoi(argv[1]) : 16 ), /* rank of permutation */
        seed = (argc>2? atoi(argv[2]) : 7654321 ); /* srandom() seed */
  int   permutation(), p[9999],     /* random permutation of 1...n */
        i=0;                        /* p[] index */
  if(n<0)n=16;  if(n>999)n=999;     /* sanity check */
  permutation(seed,n,p);            /* generate permutation */
  for ( i=0; i<n; i++ )             /* print results */
    printf("%5d%s",p[i],((i+1)%10==0||i==n-1?"\n":", "));
  } /* --- end-of-function main() --- */
#endif

Now, I think you'd just want the first $b<n$ entries, taking $p[0]\ldots p[b-1]$ as your random sample without replacement. And in that case, I think you could just stop the loop at $b-1$ (instead of at $n-1$ for the entire permutation). And then the complexity's clearly just linear in $b$, which I suppose is pretty much as good as you can hope for.