Categorical distribution pmf

31 Views Asked by At

I am trying to understand the pmf $p(y|\theta_1,\dots,\theta_c)=\Pi_{k=1}^c\theta_k^{y_k}$ of the categorical distribution but I do not understand why there aren't any $1-\theta_k$ terms, like in the Bernoulli $p(x)=\theta^x(1-\theta)^{1-x}$. Is it to do with encoding - that it is the probability of being classified as classes $y$ and it is not concerned with which classes it is not classified as?

1

There are 1 best solutions below

0
On

For your expression of the PMF to make sense, you should add a few important requirements:

  • for every $i \in \{1, \dotsc, c\}$, $\theta_i$ is the probability of the $i$-th outcome, so that we need $\theta_i \in [0,1]$;
  • for the probabilities of every possible outcome to sum up to $1$ it must hold that $\sum_{i = 1}^c \theta_i = 1$;

There are a few different ways to express the PMF of a categorical distribution (see the relevant Wikipedia page for some examples), but from your notation I guess that your $\mathbf{y}$ should be a vector $\mathbf{y} = (y_1, \dotsc, y_c)$ satisying

  • $y_i \in \{0,1\} \text{ for every } i \in \{1, \dotsc, c\}$;
  • $\sum_{i = 1}^c y_i = 1$,

that is, exactly one component of the vector is equal to $1$ and all the others have value $0$.

The Bernoulli distribution is just an example of a categorical distribution with only two categories, corresponding to the two possible outcomes: $0$ and $1$. To see this, let $\theta_1 = 1-p$ and $\theta_2 = p$. We obtain the PMF $$ p(\mathbf{y} \; | \; \theta_1, \theta_2) = (1-p)^{y_1} \cdot p^{y_2}, $$ where $\mathbf{y} = (y_1, y_2) \in \{0,1\} \times \{0,1\}$. This coincides exactly with the PMF of a Bernoulli distribution of parameter $p$ (notice that here $\mathbf{y} = (1,0)$ corresponds to the outcome $0$ and $\mathbf{y} = (0,1)$ corresponds to the outcome $1$). I hope that this clarified things for you.