Say I have $10^6$ balls, $3$ bins $A,B,C$, and $2$ machines $X$ and $Y$ that distribute the balls into the bins according to an internal set of rules (i.e. a probability distribution). If I run both machines multiple times and get the following averages--
$X$: $80$% of balls into bin $A$, $12$% into $B$, $8$% into $C$;
$Y$: $82$% into $A$, $12$% into $B$, $10$% into $C$--
My intuition says that $X$ and $Y$ follow the same probability distribution. However, if I get--
$X$: same as above;
$Y$: $30$% into $A$, $30$% into $B$, $40$% into $C$--
the probability distribution is likely to be different. How do I verify this statistically? A student's t-test between $X$ and $Y$ for individual bins seems too simplistic and doesn't account for the fact that $A$, $B$, and $C$ are not independent, but I can't dig up anything else from my very limited stats background.
Also, would the test be the same if instead of discrete balls I had a continuous quantity that could be partitioned into the bins in arbitrary fractions?
Assume that machine $X$ has been run enough times that it can be considered as the population.
Since the variables are categorical, you can then apply the chi-squared goodness of fit test (e.g. http://www.ics.uci.edu/~jutts/8/Lecture27Compact.pdf) to test the null hypothesis that machine $Y$ is drawing samples from the same population. The sample size for machine $Y$ needs to be at least $63$ ($5/0.08=62.5\lessapprox 63$), in order to ensure that at least 80% of expected counts (in our case all of them) are at least $5$.
For example, assume the second set of figures for machine $Y$ in the question are from a sample of size $n=100$ (the chi-squared statistic depends on the proportions and the sample size).
Then the event counts are:
$$\begin{array}{ccc} O_A &= &(0.3)100 &= 30 \\ O_B &= &(0.3)100 &= 30 \\ O_C &= &(0.4)100 &= 40 \\ \end{array}$$
while the expected counts are
$$\begin{array}{ccc} E_A &= &(0.80)100 &= 80 \\ E_B &= &(0.12)100 &= 12 \\ E_C &= &(0.08)100 &= 8 \\ \end{array}$$
So the $\chi^2$ statistic has a value of
$$\chi^2 = \sum_i{\frac{(O_i-E_i)^2}{E_i}}=\frac{(30-80)^2}{80}+\frac{(30-12)^2}{12}+\frac{(40-8)^2}{8}=186.25$$
Now to test the null hypothesis at the $1\%$ level of significance.
The degrees of freedom for three categories are:
$$df=3-1=2$$
and from chi-squared tables, e.g.
(source: http://sites.stat.psu.edu/~mga/401/tables/Chi-square-table.pdf)
the critical value of ${\chi^2}_{cr}=9.210$. Since our test statistic (186.25) is larger than this, we reject the null hypothesis and conclude that there is statistical evidence to support the notion that machine $Y$ is not distributing balls according to the same rules as machine $X$.
Note: This is not a completely rigorous treatment as it presupposes that machine $X$ can be treated as the population.