probability of seeing a set of combinations

83 Views Asked by At

Say I have 3 combinations of letters picked at random from the alphabet, with duplications forbidden within a combination, but allowed between combinations (i.e. A, B, C and A, C, E are allowed). The combinations cannot be the same, and the order in which the letters are picked does not matter (A, B, C and B, C, A are equivalent - this means that I can't have A, B, C and B, C, A as two combinations).

I'm very much a beginner with stats, but I believe the probability of seeing three combinations of letters of length 3, 4 and 5 letters, from a 26-letter alphabet, would be something along the lines of sum([26 / (26!(26−i)! for i in [3,4,5]) - not sure how to get statistical notation in here so I wrote in python. This could be wrong, I am still refreshing my secondary school combinatorics.

However, I am also interested in seeing how the combinations distribute across the alphabet. So, whilst the combinations of [A, C, D], [D, C, A] and [C, D, A] are equivalent for me, I would like to find the probability of (for example) having the combinations of [A,C,D], [A, B, E, F],[A, B, F], as I think the likelihood of having three combinations all completely situated within the short subsequence [A, B, C, D, E, F] is less likely than having them spread across the alphabet. This applies to both the feature themselves (e.g. [A, B, C] is more clustered towards the front of the alphabet than [A, J, X]) and regions where the features are found (e.g. [A, B, C], [F, G, H, I], [X, Y, Z] is more spread out across the alphabet than the initial example, even though the features themselves are fairly clustered).

I am probably confusing concepts, but can anyone point me in the direction of how to approach this? Or point me in the direction of an area to read up on?

Many thanks!

1

There are 1 best solutions below

0
On

I'm not sure what you mean by saying that any of the 3 combinations cannot be the same, because you also said that their lengths are 3, 4 and 5

So how can any 2 of them be identical?

Anyways, I'm solving it firstly assuming their lengths are 3, 4 and 5 and secondly by assuming their lengths are same (=3)

Let's calculate the favourable outcomes for Case 1

The 1st 3-letter combination can be made by selecting 3 letters from 26. That is, in C(26,3) ways

For the 2nd 4-letter combination, we have C(26,4) possibilities

For the 3rd 5-letter combination, we have C(26,5) possibilities

Hence total number of ways C(26,3) × C(26,4) × C(26,5)

For Case 2,

The 1st 3-letter combination can be formed in C(26,3) ways

For the 2nd one also, all 26 alphabets are available but it cannot be identical to the 1st combination

So it can be made in C(26,3)-1 ways

For the 3rd, the same follows as we cannot repeat the 1st and 2nd ones. So the third one can be made in C(26,3)-2 ways

Hence the total number of ways C(26,3) × [C(26,3)-1] × [C(26,3)-2]

Now we calculate the total number of ways in which 3 combinations can be made from the 26 letters in the alphabet (when the letters are picked for each combination with replacement, as per your question)

The 1st combination made can have any number of alphabets

So each of the 26 alphabets have a choice, whether to become a part of it or not (Hence 2 possibilities). But for a combination to form, atleast 1 letter should be selected

Hence the possibilities for 1st combination are 2^26 - the case when none of the letters are present in the combination (completely empty selection)

Same applies to the 2nd and 3rd ones

So total possibilities = (2^26-1) × (2^26-1) × (2^26-1) = (2^26-1)³

Hence,

For Case 1, probability= C(26,3) × C(26,4) × C (26,5)/(2^26-1)³

For Case 2, probability= C(26,3) × [C(26,3)-1]× [C(26,3)-2]/(2^26-1)³

Edit: Please let me know in the comments if I'm misinterpreting your question