When I have a long list of items (Chinese characters in this case) like this:
x │ rank │
—————————————
的 │ 1 │
人 │ 2 │
一 │ 3 │
中 │ 4 │
上 │ 5 │
要 │ 6 │
大 │ 7 │
在 │ 8 │
出 │ 9 │
以 │ 10 │
自 │ 11 │
他 │ 12 │
年 │ 13 │
可 │ 14 │
多 │ 15 │
家 │ 16 │
能 │ 17 │
生 │ 18 │
好 │ 19 │
本 │ 20 │
...
氕 │ 15677 │
where rank indicates the ordinal of each item when ordered by descending number of appearances in a given corpus, and assuming that the distribution follows Zipf's Law, how can I without further knowledge partition my $ n = 15677 $ items into $ d $ coherent, disjunct groups such that the cumulative occurrences of the items of each group is (roughly) equal?
I'm doing this in PostgreSQL so hints for using standard statistical functions would be awesome.