Number of combinations for specific problem

23 Views Asked by At

I'm currently programming a small script and have a statistical problem: I have 24 sequences, consisting of 4 different characters and a length of 8 characters. It is necessary to group the sequences to 8 sequences per group.

To make it more clear, here are two example groups:

Example 1: CAAGTCGT      Example 2: CAAGTCGT
           GTCTCATC                 GTCTCATC
           ACGTCGTT                 ACGTCGTT
           GTCCTGTT                 GTCCTGTT
           AGAAGCCT                 AGAAGCCT
           GAAGATCC                 GATGATCC
           TCGGATTC                 TCGGATTC
           CGGAGTAT                 CGGAGTAT

Eight sequences are in one group and I'm searching the number of combinations of these groups when:

  1. Reading the sequences from top to bottom: From the first to the sixth position (left to right), every character should exist at least once (ACGT) on every position. In example 1, position 3 has no T, so this would not be a match (You don't have to care about this, because these groups are filtered in my script). However, example 2 has a T on position 3. This would match. In both examples, position 8 has neither an A or an G but because position 8 does not matter, it is fine.

  2. The order doesn't matter. In example 1: If I would change the positions of sequence 1 and 4, this would be the same result. To reduce the computational load I already filtered these groups in my script. If I would replace just one sequence with one of the remaining (I displayed 8 but there are 16 others) sequences, this would be a new group which would count.

Again, I need the number of combinations while a group fulfils the above-mentioned criteria.

I hope I could express my problem in a proper way. If not, please do not hesitate to ask questions.