Non-comparison based sorting algorithm based on "Number of unique integer in random generated arrays"

Question

Non-comparison based sorting algorithm based on "Number of unique integer in random generated arrays"

43 Views Asked by Bumbble Comm At 27 Mar 2026 - 12:01

I have asked questions about Number of unique integer in randomly generated arrays. Suppose we have $10^6$ random generated numbers, we should have about $~6*10^5$ unique numbers. I wrote a custom sort algorithm based on this assumption.

Pseudocode :

P := 0

RecursiveSort(Array A):
  If length of A <= 1:
    return A

  Find maximum and minimum of A.
  Create Array B of length A.

  P := P + length A                       // For evaluating purpose

  For each number I in A:
    I := (I - min) / (max - min)          // Rescaling I based on max and min of A
    K := I multiply by the length of A        // 0<=K<=Length of A
    K := round(K)                         // Make K an index in Array B
    Put I in B[K].                        // Each B[K] may hold several I
  
  Create Array Result of length 0.
  For each array L in B:
    For each number I in RecursiveSort(Array L):
       Put I in Result
  
  Return Result

After several runs, I noticed that P / length Awas approximately 1.73 for uniform distribution and 2.19 for normal distribution. Why is that so?

For uniform distribution, after each iterations the size of Array A reduce by $e$ based on the assumption, shouldn't P / length A be approximately 1.58 ($ 1 + e^{-1} + e^{-2} + e^{-3} + .. + e^{-n}$) ?

Here is actual python code:

import numpy as np
import random
A = []
lim = 1000000
#Generating normal distribution array.
# a_normal = np.random.normal(lim * 1000 / 2, lim * 1000 / 6, lim) 
# a_normal = np.random.normal(lim * 1000 / 6, lim * 1000 / 4, lim)
# a_normal = np.random.normal(0, lim , lim)
# A = a_normal.tolist()

for _ in range(lim):
  A.append(random.randrange(lim * 100, lim * 1000 - 1))

P = 0

def custom_sort(A):
  if (len(A)) <= 1:
    return A

  mi = min(A)
  ma = max(A)

  global P
  P += len(A)

  if ma == mi: # if all numbers are the same
    return A

  B = [[] for _ in range(len(a))]
  for I in A:
    K = int((i - mi) / (ma - mi) * (len(a) - 1))
    B[K].append(I)

  result = []
  for L in B:
    for I in custom_sort(L):
      result.append(I)
  return result

sort(A)
print(P / len(A))

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This is a bucket sort with $k$ (the number of buckets) equal to the input length. It runs in linear time if the input data is uniformly distributed, since all the buckets are small after the first pass.

You can optimize a bucket sort for any input distribution by using that distribution's CDF as the bucketing function; if the input matches the expected distribution, then the CDF values will be uniformly distributed between 0 and 1.

So e.g. if you expect input data to be normally distributed, you could start by computing the mean and variance of the data and then bucketing using a normal CDF with those parameters.

Non-comparison based sorting algorithm based on "Number of unique integer in random generated arrays"

There are 1 best solutions below

Related Questions in COMBINATORICS

Related Questions in NUMBER-THEORY

Related Questions in RANDOM

Related Questions in SORTING

Trending Questions

Popular # Hahtags

Popular Questions