How to compute entropy of multivariate distribution?

154 Views Asked by At

Suppose you have the following distribution for $\mathbb x$, where $\mathbb x$ is an $n$ dimensional one-hot vector.

(i.e., $\mathbb x = [x_1, x_2...x_n]$, where $j,k \in \{ 1,2,...n\}$ such that $x_k =1$ and $x_j = 0 $ if $j \neq k$)

The distribution for $\mathbb x$ is

$p(\mathbb x| \theta) = \prod_{i}\theta_i^{x_i}$, where $\theta_i$ is a probability (i.e., $\theta_i \in (0,1)$, $\theta_i \geq0$, $\sum_i\theta_i = 1$)

How do I compute the entropy for the distribution $p(x)$? I'm trying to figure out where to start.


When I try this approach, it doesn't make sense:

$\mathbb E(- \ln p) = -\mathbb \prod_{i}\theta_i^{x_i} \ln(\prod_{i}\theta_i^{x_i})$

1

There are 1 best solutions below

0
On

Although $\mathbb{x}$ is a fancy encoding, there is an equivalent random variable that takes values in $\{1,2,...,n\}$, call it $\tilde{x}$. So, the outcome whose only 1 entry is at index $k$ corresponds to the event $\{\tilde{x} = k\}$. These two random variables are equivalent for our purposes and have the same entropy. I'm only changing the representation to make things clearer, the heart of the argument is the simplification of the distribution below.

If you plug in the one-hot vectors into the probability distribution, you'll realize that all zero entries yield a 1 in the product, leading to $P(x_k = 1) = \theta_k$. So, we have $\tilde{x} = k$, with probability $\theta_k$.

The entropy your looking for is $H(\tilde{x}) = H(\mathbb{x}) = -\prod_{k=1}^n \theta_k \text{ln}\left( \theta_k \right)$