A neural network predicts probabilities $p_1, p_2, p_3... p_N$ of a categorical distribution. These probabilities need to satisfy 2 conditions:
$$ \left\{ \begin{array}{l} 1. \sum_{i}^{N} p_i = 1 \\ 2. \sum_{i}^{N} -p_i log_2(p_i) = B \end{array} \right. $$
Where $B$ is the average bits per symbol in a sequence made of random samples from the categorical distribution. B is specified in advance and it is set to a valid entropy value, so $0 \le B \le -log_2(\frac{1}{N})$.
Is there a function that maps the output of the neural network (assumed to be in $\mathbb{R}^N$, but it can also be in $\mathbb{R}_{\ge0}^N$ or $(0,1)^N$ if it makes things easier) to the solution space of this underdetermined system of nonlinear equations, which I'm gonna call $\mathbb{S}^N$?
I did some research on the topic but I haven't found anything, so pointers would be gladly accepted. At the moment, all I know is:
For $B = 0$ there are $N$ solutions (one $p$ set to 1, all others set to 0);
For $B = -log_2(1/N)$ there is only 1 solution (all $p$ equal to $\frac{1}{N}$);
For $0 < B < -log_2(\frac{1}{N})$, there are infinite solutions except if $N=2$, where there are 2 solutions (I'm not considering the degenerate case where $N=1$ because it would be useless).
Ideally, this transformation should be valid for any $N \ge 0$, but if there is a solution for a specific case (e.g. $N=2$) I'd like to hear about it. It should also be valid for all $B \in [0, -log_2(\frac{1}{N})]$.
Note: it's easy to satisfy only one of these conditions at a time, e.g. condition 1 can be satisfied by dividing all probabilities by their sum, and condition 2 can be satisfied by substituting $-p_i log_2(p_i)$ with $z_i$, then dividing all $z_i$ by their sum, multiplying by B, then reverting the substitution to get $p_i$. But if I apply one of these transformations, the other condition is not satisfied.