Are there any works on partitioning sequences into n subsequences to maximize information entropy?

118 Views Asked by At

I'd like to know about work on the partitioning of sequences:

Given a sequence $S$ with length $k$. Partition it into $n : n \leq k$ subsequences in order to maximize the information entropy $H$ of the frequency distribution $P$ of the subsequences.

You want to partition your sequence in such a way as to having a uniform frequency distribution of the subsequences.

Example given the sequence $S=<a,a,a,a,a,b>$ and $n=2$ We can partition it any way we like and have two unique subsequence such as $<a,a,a>$ and $<a,a,b>$

If we instead let $n=3$ two examples of partitions with different properties are $<a,a>$, $<a,a>$, $<a,b>$ and $<a,a,a>$, $<a,a>$, $<b>$. The first suggestion of partition have a lower entropy than the second, since the subsequence $<a,a>$ appears twice.

Are there any work done in this area and on problems like this?