Suppose we have i.i.d. $\vec v_1, ..., \vec v_n \in \mathbb R^d$, with $\vec v_i$ (in one-hot representation) being drawn from $\mbox{Categorical}(\theta_i)$, $\theta_i\in \mathbb R^d, \theta_i\geq 0, |\theta_i|=1$. i.e. each $v_i$ is a vector of 0s except at one position, where it is 1. I am wondering what is the distribution of the pointwise maximum $\vec m=\max(v_1, ..., v_n)$. i.e. for each entry $m_j$, if at least one of $v_{i,j}$ is 1, $m_j=1$, otherwise $m_j=0$.
Obviously the entries of $\vec m$ are not independent (specifically $\vec m=0$ is not possible) so we can't multiply probability for each individual entries of $\vec m$ to get the answer. One way to find it is to explicitly enumerate all combinations of $\vec v_1, ..., \vec v_n$ that can lead to $\vec m$. However, this would become quickly intractable as $n$ or $d$ becomes large. So I am wondering if there is a more clever way to calculate this.
Let $m$ be a binary vector, and let $S$ be the set of indices $j$ for which $m_j=1$. Using the principle of inclusion exclusion, the probability that $\max(v_1,\dots,v_n)=m$ is $$ \sum_{T\subseteq S}(-1)^{|S|-|T|}\prod_{i=1}^n\left(\sum_{j\in T}\theta_{i,j}\right), $$ where the summation ranges over nonempty subsets of $S$. For example, the $T=S$ term gives the probability that only categories in $S$ appear. From these, we subtract probability that only categories in $S-\{s\}$ occur for each $s\in S$, because we want $s$ to appear. Then we add back in the doubly subtracted cases $S-\{s,t\}$, subtract the triply subtracted cases, etc.
This takes $2^{|S|}|S|n$ time to compute, as far as I can see. This is pretty slow, but I do not know a faster method.