Group sizes from Zipfian ranks

20 Views Asked by Bumbble Comm At 07 Apr 2026 - 3:16

When I have a long list of items (Chinese characters in this case) like this:

x   │  rank │
—————————————
的  │     1 │
人  │     2 │
一  │     3 │
中  │     4 │
上  │     5 │
要  │     6 │
大  │     7 │
在  │     8 │
出  │     9 │
以  │    10 │
自  │    11 │
他  │    12 │
年  │    13 │
可  │    14 │
多  │    15 │
家  │    16 │
能  │    17 │
生  │    18 │
好  │    19 │
本  │    20 │
   ... 
氕  │ 15677 │

where rank indicates the ordinal of each item when ordered by descending number of appearances in a given corpus, and assuming that the distribution follows Zipf's Law, how can I without further knowledge partition my $ n = 15677 $ items into $ d $ coherent, disjunct groups such that the cumulative occurrences of the items of each group is (roughly) equal?

I'm doing this in PostgreSQL so hints for using standard statistical functions would be awesome.

Original Q&A

Group sizes from Zipfian ranks

Related Questions in STATISTICS

Trending Questions

Popular # Hahtags

Popular Questions