DNA sequences are often encoded with the four symbols [C T A G]
How many DNA sequences of length $10$ contain T A G A as a substring?
I'm confused here as to what do they mean by "substring" and how to approach it. I approached it by doing $\binom{4}{3}$ since there are $4$ spots and $2$ of the letters are repeated.
You have 10 places to fill and for each place you have 4 objects [C T A G]. Each place can be filled by any of the 4 objects. But 4 of the places will be occupied by TAGA. So, now you have 6 other places to fill. So, there are 4^6 ways to that. But, now you have TAGA as an object that can go on any of the seven places (6 places + 1 place for TAGA). So, total ways is 7*(4^6).
But this number also contains possibilities of more than one TAGA as a substring. So, we need to subtract the number of possibilities for which there is more than one TAGA as a substring.
Out of the 10 places there would be 2 TAGA substrings occupying 8 places. So, effectively there are 4 places to fill. The number of ways that can be done is 4 choose 2 that is 6. And remaining 2 places can be filled in 4^2 ways. So, total ways in which the 10 places contain one TAGA as a substring will be
7*(4^6) - 6*(4^2).