Expectancy over Perplexity

31 Views Asked by At

Consider a set of $N$ random variables $X_j$ that each either take a constant value $c_j$ with uniform probability $\rho\in[0,1]$ and are $0$ otherwise. Fix some index $i\in \{1,\ldots,N\}$. From these, we construct a set of Bernoulli variables $P_{j\mid i}=\frac{X_j}{\sum_{k\neq i}X_j}$, where we set $P_{j\mid i}=0$ iff $i=j$.

Then, I am interested in the expectancy $$\mathbb{E}\left(2^{-\sum_{j\in[N]\backslash\{i\}} P_{j\mid i}\log(P_{j\mid i})}\right).$$ Currently, I am working on an experimental setup to compute this via a Monte-Carlo approach, but I wonder whether this can be computed analytically. (Update: The Monte Carlo approach seems to verify the conjecture we have on this expectancy, but it would still be nice to have an analytic argument for this.)

Actually, the formulation given above is not quite correct (but might be simpler). The problem that I would like to solve is if exactly $\lfloor \rho N\rfloor$ of the $X_j$ take their respective value $c_j$ and the others all take value $0$. That is, in this version, the denominator in the expectancy can never be $0$. But I am not quite sure how to formulate this.

The problem arises in non-linear dimensionality reduction. Instead of computing an embedding for a data set of $N$ points, we just want to embed a sample. We now the perplexity value for the $N$ points and now we are interested in what perplexity value to expect for $\lfloor \rho N\rfloor$ points randomly sampled.