I have some data and its distribution as a histogram. Let's say for example there are the following 20 data items:
- 3 times a A
- 5 times a B
- 4 times a C
- 4 times a D
- 3 times a E
- 1 times a F
Now I want to remove for example 5 randomly choosen data items. How can you calculate the probability that an value type is removed completely? For instance that all three A's are removed or all five B's.
My approach would be to calculate for every value type the probability for removing this particular type by counting the possibilities to choose 5 data items including for instance all three A's. Afterwards I would divide this count by the number of possibilities to choose 5 data items out of 20. At the end I would add all these probabilities and again remove all probabilities that are counted multiple times. For example P(A or F removed) = P(A removed) + P(F removed) - P(A and F removed).
But if I imagine a large histogram, this would be a very complex calculation. So my question is: Is there may be a better way?
It would be less confusing if you labelled the columns with letters rather than numerals.
Your approach does seem the best.
The probability, that $x$ specific items will be among the $5$ selected from $20$ and $(5-x)$ items will be among the remaining $(20-x)$, is:
$$P(X_x) = \dfrac{20-x\choose 5-x}{20\choose 5}$$
Thus with the understanding that $P(A_3)$ measures the probability that all 3 type A columns are among the 5 removed:
$$ P(F_1)=\dfrac{19\choose 4}{20\choose 5}, P(A_3) = P(E_3) = \dfrac{17\choose 2}{20\choose 5}, P(C_4)=P(D_4)=\dfrac{16}{20\choose 5}, P(B_5) = \dfrac{1}{20\choose 5}$$
So for unions:
$$P(A_3\cup F_1) = P(A_3)+P(F_1)-P(A_3\cap F_1) = \dfrac{{17\choose 2}+{16\choose 4}-{16}}{20\choose 5}$$
Et cetera...
However, since we only need to exclude intersections of size 5 or less.
$$P(A_3\cup B_5 \cup C_4 \cup D_4\cup E_3\cup F_1) \\ = P(F_1)+P(A_3)+P(E_3)+P(C_4)+P(D_4)+P(B_5)- P(A_3\cap F_1)-P(E_3\cap F_1)-P(C_4\cap F_1)-P(D_4\cap F_1) \\ = \frac{{19\choose 4}+2{17\choose 2}+2{16\choose 1}+{15\choose 0}-2{16\choose 1}-2{15\choose 0}}{20\choose 5} \\ = \frac{{19\choose 4}+2{17\choose 2}-{15\choose 0}}{20\choose 5} \\ =\frac {4147} {15504}$$