Motivation
If have found my self answering to a SO question about Monte Carlo simulation. The model to design is stated as this:
Let 20 people, including exactly 3 women, seat themselves randomly at 4 tables (denoted A, B, C and D) of 5 persons each, with all arrangements equally likely.
Let $X$ be the number of tables at which no women sit. Write a Monte Carlo simulation to estimate the expectation of $E[X]$ and also estimate the probability $P(A=0)$ that no women sit at table A. Run the simulation for 3 cases (100,1000,10000)
First, I drafted a first naïve solution, and then when refactorizing I changed the way possible inputs were drawn. I observed that solutions did not converged to the same quantities. Elements supporting this observation are available in my answer.
Implementations
First implementation
- Create a vector of length $20$ populated as follow $(1,1,1,0,\dots,0)$ where $1$ stands for a woman and $0$ for a man;
- Draw a random permutation of this vector;
- Assess the number of tables with no woman (splitting the vector into 4 tables);
Second implementation
- Construct a set $S$ composed by $3$ numbers sampled from $\{1,\dots,4\}$, it represents table where at least one woman sit with $0<\#S<4$;
- Here $X$ is simply assessed by $4 - \#S$;
Quantities
Then, for each process, output are processed as follow:
- Initialize empty Counters;
- For each sample in $\{1,\dots N\}$:
- Draw an experiment and assess the value of $X$, this is where de difference lies;
- Add $1$ to the modality $X=k$ in a dictionary of integer;
- Add $1$ if table A is empty in an integer;
- Rationalize counts with $N$ to get frequencies;
- Assess the expected value $E[X]$;
Models
First implementation
I think the first implementation can be modeled as follow:
$$ \#\Omega = C^{5}_{20}C^{5}_{15}C^{5}_{10}C^{5}_{5} = 11732745024 $$
With probabilities:
$$ \begin{align} P(X=0) &= 0 \\ P(X=1) &= \frac{4!}{3!}\frac{C^{1}_{3}C^{4}_{17}C^{1}_{3}C^{4}_{13}C^{1}_{3}C^{4}_{9}C^{5}_{5}}{\#\Omega} = \frac{25}{57} \simeq 0.4386 \\ P(X=2) &= \frac{4!}{2!}\frac{C^{1}_{3}C^{4}_{17}C^{2}_{2}C^{3}_{13}C^{5}_{10}C^{5}_{5}}{\#\Omega} = \frac{10}{19} \simeq 0.5263 \\ P(X=3) &= \frac{4!}{3!}\frac{C^{3}_{3}C^{2}_{17}C^{5}_{15}C^{5}_{10}C^{5}_{5}}{\#\Omega} = \frac{2}{57} \simeq 0.03509 \\ P(X=4) &= 0 \end{align} $$
Then the expected value is:
$$ \mathrm{E}[X] = \frac{25}{57} + 2\frac{10}{19} + 3\frac{2}{57} = \frac{91}{57} = 1.5965 $$
And the probability of having not a woman a table A is:
$$ P(A=0) = \frac{C^{5}_{17}}{\#\Omega}\left( 3 C^{3}_{3}C^{2}_{12}C^{5}_{10}C^{5}_{5} + 6 C^{1}_{3}C^{4}_{12}C^{2}_{2}C^{3}_{8}C^{5}_{5} + C^{1}_{3}C^{4}_{12}C^{1}_{2}C^{4}_{8}C^{1}_{1}C^{4}_{4} \right) = \frac{91}{228} \simeq 0.3991 $$
Which is compliant with results of method runMonteCarlo.
Second Implementation
And the second implementation can be modeled by a locker with 4 symbols (A, B, C, D) and 3 digits, then there is $\# \Omega = 4^3 = 64$ possible setups.
Then we can assess $P(X=k)$ using combinatorics:
$$ \begin{align} P(X=0) &= 0 \\ P(X=1) &= \frac{C^1_4 C^1_3 C^1_2}{4^3} = \frac{24}{64} = 0.375 \\ P(X=2) &= \frac{C^1_4 C^1_3 C^2_3}{4^3} = \frac{36}{64} = 0.5625\\ P(X=3) &= \frac{C^1_4}{4^3} = \frac{4}{64} = 0.0625 \\ P(X=4) &= 0 \end{align} $$
The expectation of $X$ is:
$$ \mathrm{E}[X] = \frac{24}{64} + 2\frac{36}{64} + 3\frac{4}{64} = \frac{27}{16} = 1.6875 $$
And the probability of having no woman at the table A is then:
$$ P(A=0) = \frac{3^3}{4^3} = \frac{27}{64} = 0.421875 $$
Which complies with the result of method runMonteCarlo2.
Questions
- Are my models correct?
- For the given problem, what is the correct solution?
Let me summarise the two ways of sampling. (1) Take a random (uniformly distributed) choice of a $3$-element subset of a $20$-element set, and for each of $4$ fixed disjoint subsets of those $20$ elements see whether at least one of their elements was chosen, then count the parts for which this was the case. (2) Take a random element of $\{1,2,3,4\}^3$ and count the number of distinct component values of the chosen triplet.
If the first method is done by selection without replacement of $3$ values among $20$, then the second can be done by a similar selection but with replacement, because you assume that for each item selection the probabilities for falling into each of the $4$ cases is equal, regardless of the values selected before. The two sampling procedure are not equivalent (the first method tends somewhat more to a balanced distribution) so it is not surprising the experimental results should be different. The first method directly models the stated problem, so it is the right one to use here.
The second method is easier to analyse, so I'll do that first. There are $4^3=64$ triplets, and simple enumeration will show that $4$ of them involve a single component value (repeated thrice), $36$ involve two component values (one of them repeated twice), and the remaining $24$ have three distinct component values (there are of course none that count no or all $4$ values as components). The more general problem of counting maps from an $n$-element set to an $m$-element set (of which this is an instance with $n=3$, $m=4$) by the size $k$ of the image set has as solution $\binom mkk!\left\{n\atop k\right\}$ (where the last factor is a Stirling number of the second kind). The expected value of $X$ would be $\frac{27}{16}=1.6875$.
For the first method there is one more parameter, the number $p$ of places per table (here $p=5$; the total number of places is $mp$). Doing some inclusion-exclusions gives the formula $\binom mk\sum_{i=0}^k(-1)^{k-i}\binom ki\binom{pi}n$ (which for the given values of the parameters gives values $40$, $600$, $500$ respectively for $k=1,2,3$, the three values adding up to $1140=\binom{20}3$, as they should. The expected value of $X$ (the correct one for the stated problem) works out to be $\frac{91}{57}\approx 1.5965$.