Imagine the following situation: An urn contains $K$ balls of different colours $\mathcal{U}=\{1,\ldots,K\}$, and with different weights $\mathbf{w}=(w_1,\ldots,w_K)$ (where $\sum_i w_i = 1$). You draw from the urn $m \leq K$ times without replacement, with probability of selecting each of the (remaining) balls proportional to their weight, and observed a sample $\mathcal{K} \subset \mathcal{U}$, $|\mathcal{K}|=m$.
I want to compute the probability of such a (unordered) sample $\mathcal{K}$. For a particular ordered tuple $\mathbf{k}=(k_1,\ldots,k_m)$ of draws from the urn, the total remaining weight after $i$ draws is $1-w_{k_1}-\ldots-w_{k_i}$, and it follows that $$ \mathbb{P}(\mathbf{k}) = \frac{w_{k_1}}{1}\frac{w_{k_2}}{1-w_{k_1}}\cdots\frac{w_{k_m}}{1-w_{k_1}-\ldots-w_{k_{m-1}}}. $$ For an unordered sample $\mathcal{K}=\{k_1,\ldots,k_m\}$, we must sum over all possible permutations $S_m$ of the elements of $\mathcal{K}$, and the probability of observation is therefore $$ \mathbb{P}(\mathcal{K}) = \sum_{\pi\in S_m} \prod_{i=1}^{m} \frac{w_{k_{\pi(i)}}}{1-w_{k_{\pi(1)}}-\ldots-w_{k_{\pi(i-1)}}}, $$ which can be (slightly) simplified to $$ \mathbb{P}(\mathcal{K}) = w_{k_1}\cdots w_{k_m}\sum_{\pi\in S_m}\Bigg(\prod_{i=1}^{m-1} \big(1-w_{k_{\pi(1)}}-\ldots-w_{k_{\pi(i)}}\big)\Bigg)^{-1}. $$
The last expression is, unfortunately, still computationally intractable even for samples containing about 100 elements -- yet I'd need this for samples of tens of thousands elements, selected from millions of colours initially present in the urn. Yet this is were I'm stuck: I can't see how to simplify this further.
Thus my question(s): Does anyone have an idea how to simply the denominator and/or how to replace it by an approximation? Or can anyone point me at literature that deals with such a situation?
My eventual goal is to estimate the weights $\mathbf{w}$ from many such (independently taken) samples $\mathcal{K}_1,\ldots,\mathcal{K}_N$ which different sizes $|\mathcal{K}_1|$, $\ldots$, $|\mathcal{K}_N|$, either by a ML or (preferrably) a Bayesian approach.
This is an answer form of a comment I posted previously.
The Conditional Bernoulli distribution is a distribution on a binary vector $X = \{X_1,....,X_m\}\in\{0,1\}^m$. The $X_i$ are distributed $X_i \sim \textrm{Bern}(p_i)$, with the total number of non-zero observations restricted to be $|S_m| = r$. All of the following is mostly just parroting the results of the linked paper.
In your proposed framework, we have a class (color) $c_i$ associated with each of the balls in the urn, and the number of observed balls pulled out without replacement is the number of binary variables associated with that class which are non-zero.
In math that would be, with $Q$ different colors in the urn:
\begin{equation} X | \{p_q\}_{q=1}^Q,\{c_i\}_{i=1}^m, r \sim \textrm{CB}(\{p_q\}_{q=1}^Q,\{c_i\}_{i=1}^m, r) \end{equation}
If and only if $X_i \sim \textrm{Bern}(p_{c_i})$ and $\sum_i X_i = |S_m| = r$; with each $c_i \in \{1,...,Q\}$.
Writing down the density is kind of complicated. Let $\mathcal{X}^r = \{X \in \{0,1\}^m$ such that $\sum X_i = r\}$. Letting $w_i = \frac{p_i}{(1-p_i)}$ be a weight associated (with a 1-1 mapping) to each of the probabilities.
Then the density of the CB distribution can be written down in a not-very-useful form as:
\begin{equation} P[X | \{w_i\}, r] =\frac{(\prod_{i \in \{1,...,m\}} w_i^{X_i})}{\sum_{Y \in \mathcal{X}^r}(\prod_{i \in \{1,...,m\}} w_i^{Y_i})} I_{(X \in \mathcal{X}^r)} \end{equation}
There are some more convenient representations involving recursive equations which make it easier to understand the algorithm presented for obtaining the MLE estimate.
They discuss an $R$ function $R(k,A) := \sum_{B \subseteq A, |B| = k}\prod_{i \in B} W_i$. There is a recursive equation for the $R$ function:
\begin{equation} R(m,A) = \frac{1}{m}\sum_{i=1}^m(-1)^{i+1}T(i,A)R(m-i,A) \end{equation}
where $A \subset \{1,...,m\}$ and $T(i,A) = \sum_{j\in A} w_j^i$.
Then the density has the expression:
\begin{equation} P[X | \{w_i\}_{i=1}^m, r] =\frac{\prod_{i \in \{1,...,m\}} w_i^{X_i}}{R(m,\{1,...,m\})} \end{equation}
Given these definitions of the R function, and multiple observations $\{X^n\}_{n=1}^N$ of the random binary vectors, denote by $X_{(i)} = \sum_{n=1}^N X^n_i$; representing the number of times variable $i$ was evaluated as 1 across all the samples. Then the log likelihood has the form:
\begin{equation} \mathcal{L}(w_i | \{X^n\}_{n=1}^N) = \sum_{i=1}^m X_{(i)} \log w_i - N\log R(m,\{1,...,M\}) \end{equation}
The approach used to calculate the MLE under this formulation is an iterative algorithm (Algorithm 1 in that paper) which repeats the following steps until convergece:
This will solve your problem, but could take an extremely long time. We can probably simplify things a bit by departing from the Chen paper at this point. Noting that there are only $Q$ colors associated with each $X_i$, the log-likelihood can be reduced to:
\begin{equation} \mathcal{L}(\{w_q\}_{q=1}^Q | \{X^n\}_{n=1}^N) = \sum_{q=1}^Q (\sum_{i \textrm{ s.t. } c_i = q}X_{(i)}) \log w_q - N\log R(m,\{1,...,m\}) \end{equation}
I think you can also do some reductions on the R function so it is not as difficult to calculate (requiring so many recursions), but I need some more time to think about it; I'm posting this for now. I think even with the reductions, assuming someone can figure it out, having millions of colors present in the might just mean you have a computationally intractable problem.
EDIT There is also a result in that paper which states that in the limit as $m \rightarrow \infty$ the distribution of a conditional binomial converges in total variation to a Multinomial distribution.
The explicit statement from that paper is that:
Assumption: There exists $c \in \mathbb{R}^+$ such that $\forall i,j \in \{1,...,M\} \frac{w_i}{w_j}\leq c$.
This assumption equates to non of the probabilities being very small or large. For instance if $p_i = \frac{i}{m}$ the condition would not be satisfied as $m\rightarrow \infty$.
Theorem (Chen 2000): If the previous assumption holds, a Conditional Binomial model with probabilities $\pi_i$ can be approximated by a $\textrm{Multinomial}(m,\lambda_1,...,\lambda_N)$ where $\lambda_i = \frac{\pi_i}{m}$ for each $i \in \{1,...,m\}$ when $\frac{N}{m^2}$ is large.
The theorem is only useful if you have a lot more samples than colors present, which according to your question seems pretty unlikely.