The multinomial count distribution for a set of $M$ categories is
$P((n_1, \dots,n_M)|N)=\left(\frac{N!}{n_1!\dots n_m!}\prod_{i=1}^Mp_i^{n_i}\right)\delta\left( \sum_{i=1}^M n_i-N\right)\;,$
where category $i=1, \dots, m$ appears $n_i$ times, and the Dirac $\delta$-function appears to enforce the sum constraint that $\sum_{i=1}^M n_i=N$.
What is the distribution, $P(m)$, of the number of observed categories, i.e. the number of categories with non-zero counts, $m=\sum_{i=1}^M\mathrm{sgn}[n_i]$, where $\mathrm{sgn}(x)=1$ for $x>0$ and $0$ for $x=0$.
I couldn't find anything in a quick internet search, but I am still hoping there may be a closed form solution, even for the case where the $p_i$ are all distinct.
One approach is to focus instead on the distribution, $P_0(m_0)$, of zero counts, $m_0$, from which I think $P(m)=1-P_0(M-m)$.
A specific subset of $m_0$ categories is denoted $\alpha=\{\alpha_1, \dots, \alpha_{m_0}\}$, where $\alpha_i\in\{1, \dots, M\}$. There are $\binom{M}{m_0}$ distinct $\alpha$ subsets. For a given $\alpha$, the probability to obtain one of the $\alpha$ categories in a single count sample is $p_\alpha=\sum_{i=1}^{m_0}p_{\alpha_i}$. Thus, the probability to obtain something other than the $\alpha$ categories across $N$ samples is $(1-p_\alpha)^N$. This representation ignores permutations of the $\alpha$ subset. There are $m_0!$ such permutations.
If someone knows a solution, please share. In lieu of a solution, I could use help in the combinatorics in the sum of $(1-p_\alpha)^N$ over subsets $\alpha$ and their orderings.
Here is an alternative solution based on generating functions.
We want to find the distribution $P(m|N)$ for the number $m$ of nonzero counts given a multinomial with parameters $N$ and a set $\lbrace p_k \rbrace_{k = 1}^M$ of probabilities. To do this, we will construct a generating function \begin{equation*} F(x,y) = \sum_{n = 0}^\infty \sum_{m = 0}^\infty f(n,m) x^n y^m \end{equation*} such that $f(N,m) = P(m|N)$. In that construction, $x$ tracks the sum of the counts, while $y$ tracks the number of nonzero counts.
A usual trick with generating functions is to try to find a product representation (for probability generating functions(PGFs), this is motivated by the fact that for the sum of independent variables, you multiply their PGF). Here, note that $F(x,y)$ is not exactly a PGF ($F(1,1) \neq 1$), but for all practical purposes, it shares the same properties.
Even if the counts for all elements $k$ are not independent, let's consider the generating function \begin{align*} G_k(x,y) &= 1 + y(e^{p_k x} -1) \;, \\ &= 1 + y \left[p_kx + \frac{(p_kx)^2}{2!} + \dots \right] \end{align*} Note that $y$ multiplies all terms having a power of $x$ greater than one, meaning it tracks all nonzero counts. If we multiply all individual generating functions, we get \begin{align*} F(x,y) = \prod_{k=1}^M G_k(x,y) \;. \end{align*}
Now the beauty of this approach: if we expand $F(x,y)$ in powers of $x$ and $y$, for a given pair $(n,m)$, we would see that it sums over all possible combinations of counts such that the sum is $n$, $m$ are nonzero, and we have the correct multinomial factors for the powers of each $p_k$ and the factorial in the denominator. The only factor missing is the $N!$ at the numerator. To adjust for this, we simply replace $G_k(x,y)$ by \begin{align*} G_k(x,y) &= 1 + y(e^{a p_k x} -1) \;, \end{align*} where $a = (N!)^{1/N}$. Again one can verify that the expansion now yield the correct terms for the multinomial distribution when $n = N$.
The last step is to invert this generating function using a fast Fourier transform (in 2 dimensions). While the whole thing seems quite complicated, the resulting code is short (see below an example in Python for uniform probabilities), I have verified that it matches with the exact result by Tobias. And this is fast! It is $O(NM^2 + NM\log(N)\log(M))$ to compute the distribution.
Some remarks: 1) here there are technically some small numerical errors due to aliasing, but since the distribution $f(n,m)$ is quickly decreasing in $n$, for a sufficiently large $N$, there is no problem. Otherwise, one can always calculate the Fourier spectrum for a larger range (increasing $L_1$ in the code above, increasing $L_2$ won't help) and discard the extra values. 2) I tested the code above for different kinds of distributions, and it was overall very stable. However, for $N \gtrsim 90$, numerical errors became an issue. For large $N$, an approximation based on the first few moments should be sufficient anyway.