Expectation of a random cross entropy

61 Views Asked by At

Let $H(q,p)=-\sum_c q(c)\log p(c)$ the cross entropy between two different probability distributions on a finite set with cardinality $N$.

$q$ and $p$ can be seen as elements of the $N-1$-symplex $S=\{ x_1,...,x_N | \sum x_i=1,x_i\ge0 \}$.

Let's now fix $q(c)$. We take $p(c)$ at random (according to the standard metric) from $S$. Told another way, $p$ is a stochastic variable which we call $\hat p$, uniformly distributed over $S$.

Now $\hat H=H(q, \hat p)$ becomes a stochastic variable. Question: what is the value of $E[\hat H]$ for a fixed $q$ ?

For the origin of the problem, this is a question that I created myself.

MY WORK: We can parametrize the symplex $S$ using the first $N-1$ coordinates. Defining the domain $D=\{ x_1,..,x_{N-1} | \sum x_i \le 1, x_i\ge0 \}$ we should have:

$E[\hat H]=-\frac{\int_ D \left( \sum_{i=1}^{N-1} q_i ln(x_i) + q_N log(1- \sum_{i=1}^{N-1} x_i ) \right)dx_1....dx_{N-1}}{\int_ D dx_1....dx_{N-1}}$

(the Jakobian should simplify), where the $q$-s are constants in the integration... but for the moment I was not able to solve these integrals....

Maybe we can observe that this is a linear function of $q$. By symmetry the coefficients must be equal and therefore:

$E[\hat H]=\alpha \sum_i q_i$, with:

$\alpha=-\frac{\int_D ln(x_1) dx_1....dx_{N-1}}{\int_ D dx_1....dx_{N-1}} $

but I am not very sure about this argument and of the explicit numerical value of $\alpha$...