I have a categorical distribution that can take values $A_1, \ldots,A_n$ with probabilities $p_1,\ldots,p_n$, respectively.
The entropy of a source that generates a sequence of random numbers from the above distribution is defined as:
$H(x)=\sum_{i=1}^{n} - p_i \log2(p_i)$
For simulation, I want to set $p_i$s such that it yields a small entropy. Then, in a few steps, I increase the entropy. At the last step, I want to reach $p_1=\ldots=p_n=1/n$ to get the maximum entropy which is $\log2(n)$. How can I do that?
This is my solution but it is kind of not straightforward. I need a better solution:
Generate $n$ independent random variables $r_1,\ldots,r_n$ according to a Gaussian distribution, then sift them to get positive probabilities $y_i=r_i-\min(r_i)+.1$. Then normalize them to make their sum equal to one: $p_i=y_i/(\sum_{i=1}^{n}y_i)$. This gives a set of probabilities $p_1,\ldots,p_n$ that seem to be Guassian and with a small entropy.
Now, in each step, we update $p_i$s: $p_i^{(new)}=(p_{i-1}+p_{i}+p_{i+1})/3$ in a circular fasion, i.e., $p_{n+1}=p_1$ and $p_{-1}=p_{n}$. After a few steps, all the values become the average of $p_i$ which is $1/n$.
Is there a better solution? Perhaps a probabilities distribution that pitching one of its parameters pitches its entropy and makes it uniform at the end?
A simpler way to create the $p_i$ is to just set $p_i = r_i^2/\sum_j r_j^2$. This has significantly lower entropy than your current method ($5.6$ vs $6.4$ for $n =100$), and has the interesting property that the vector $[\sqrt{p_i}]$ is uniformly distributed on the unit sphere in $\mathbb R^n$. It's worth noting that your expected entropy can't be all that low if you want the initial $p_i$ to be IID, since they'll cluster around the mean of whatever distribution you're using.
That method of updating the $p_i$ has the potential problem that once it smooths out local maxima, it's very slow to reach uniform $p_i$. If this is undesirable, you can speed this up by averaging the $p_i$ with a random permutation of them. This quickly mixes the values together. If random permutations aren't available, averaging with two medium-sized relatively prime offsets seems to produce the fastest mixing