Number of DNA sequences of length $10$ that contain TAGA as a substring

727 Views Asked by At

DNA sequences are often encoded with the four symbols [C T A G]

How many DNA sequences of length $10$ contain T A G A as a substring?

I'm confused here as to what do they mean by "substring" and how to approach it. I approached it by doing $\binom{4}{3}$ since there are $4$ spots and $2$ of the letters are repeated.

1

There are 1 best solutions below

7
On BEST ANSWER

You have 10 places to fill and for each place you have 4 objects [C T A G]. Each place can be filled by any of the 4 objects. But 4 of the places will be occupied by TAGA. So, now you have 6 other places to fill. So, there are 4^6 ways to that. But, now you have TAGA as an object that can go on any of the seven places (6 places + 1 place for TAGA). So, total ways is 7*(4^6).

But this number also contains possibilities of more than one TAGA as a substring. So, we need to subtract the number of possibilities for which there is more than one TAGA as a substring.

Out of the 10 places there would be 2 TAGA substrings occupying 8 places. So, effectively there are 4 places to fill. The number of ways that can be done is 4 choose 2 that is 6. And remaining 2 places can be filled in 4^2 ways. So, total ways in which the 10 places contain one TAGA as a substring will be

7*(4^6) - 6*(4^2).