I'm having a hard time analyzing my research data, and was wondering if anyone had any suggestions? I've reworded the question so it is presented more like a statistics problem.
There are $x$ number of groups. At random, $n$ number of items are distributed into these $x$ groups. NOTE: There is a restriction for the data that only a maximum of 2 items can be placed in a group, and groups can be left empty. I realize that this restriction prevents the items from being placed completely randomly, but other that this restriction, there is nothing influencing their placement.
If there are 15 groups and 13 items, what is the probability that any 7 groups have 1 item, and any 3 groups have 2 items (NOTE: which groups contain which number of items is not important).
Is this likely to occur if the items are randomly distributed, or does it appear statistically significant?
Thank you
Let me use $k$ instead of $x$ for the number of groups. Let $t_i$ be the number of items in group $i$. The distribution of $t_i$ is a truncated multinomial
$$P({\bf t}) = \alpha \frac{1}{\prod t_i!} [0\le t_i \le 2][\sum_{i=1}^k t_i =n]$$ where $\alpha$ is a normalization factor.
Let ${\bf c} =\{c_0,c_1,c_2\}$ denote the number of groups with 0,1,2 items ($c_0+c_1+c_2=k$ , $c_1 + 2 c_2 =n$ ). Then
$$P({\bf c}) = \alpha \frac{k!}{c_0! c_1! c_2!} \frac{1}{2^{c_2}} $$
This seems difficult to express in simple terms, but the conditions $c_0+c_1+c_2=k$ , $c_1 + 2 c_2 =n$ reduce the degrees of freedom to 1, so it's trivial to compute numerically: just tabulate for the possible values of $c_2$ (from 0 to 6) and compute the normalizing factor.
I get: $P(c_0=5,c_1=7,c_2=3)=0.381646...$ and that this is the most probable configuration.
Spreadsheet
Update: Here's a quick estimate for large values of $k$. The distribution is asymptotically equivalent to $k$ iid truncated Poisson (to the values 0,1,2) with mean $n/k$. Hence the recipe:
$$\begin{array}{rcl} \mu &=&\frac{n}{k}\\ \lambda&=& \frac{\mu-1 +\sqrt{2\mu+1-\mu^2}}{2-\mu}\\ \alpha &=& \frac{1}{1+\lambda+\lambda^2}\\ p_0 &=& \alpha\\ p_1 &=& \alpha \,\lambda\\ p_2 &=& \alpha \frac{\lambda^2}{2}\\ \end{array} $$
Then, the expected configuration $(c_0,c_1,c_2)$ should be around $(k p_0,k p_1 ,k p_2)$ (in our example: $(5.4,6.1,3.4)$ , quite near the most probable $(5,7,3)$). Or, put in other way, the variable $c_2$ (number of filled groups) have a mean of about $k p_2$ and a standard deviation of about $\sqrt{ k p_2 (1 - p_2)}$