DNA sequence in MATLAB

603 Views Asked by At

I am wanting to count how many times synonymous and non- synonymous mutations appear in a sequence of DNA, given the number of synonymous and non- synonymous mutations in each 3 letter codon. ie given that AAA has 7 synonymous and 1 non- synonymous equations, and CCC has 6 and 3 respectively, then the sequence AAACCC would have 13 synonymous and 4 non- synonymous mutations. However, these sequences could have 10k + letters with a total of 64 different 3 letter combinations... How could I set up an M file, using for / else if statements to count the mutations? Thanks

1

There are 1 best solutions below

0
On

Assuming you have filtered out the data errors and each time you nicely have three letter, here is one approach:

1) Make your data look like this:

AAA
CCC
ACA
CAC
...

2) Count how many times each of the 64 options occurs.

3) Multiply that found number of times with the corresponding syn and non-sym mutations.

That should be it!


Note that step 2 and 3 can easily be achieved with Excel as well. If you are not fluent in matlab it will probably even be quicker.