Shannon entropy of DNA with 3 basis

216 Views Asked by At

Let's say we have a new life form with DNA based on only three basis A, B and C with 15 proteins, 5 with probability 0.1, 5 with probability 0.06 and 5 with probability 0.04. For the entropy of a sequence of 1000 basis in this case, can I just calculate Shannon entropy for a single basis and then multiply the result by 1000? Further, how can I codify this 3 basis for a set of 20 aminoacid?

I still have some difficult in understanding this applications of codification and entropy to biology, so I would be thankful to have help in figuring out this problem.

1

There are 1 best solutions below

2
On

You can't simply calculate the entropy of one base and multiply by 1000; that works only if the bases are equally probable.

Instead, you should calculate the entropy of an average peptide, using the $$E = \sum -p_i \log_2 p_i$$ formula and the given probabilities. Then the approximate entropy of a single base is $\frac E3$. (I assume that there are always three bases per peptide, although you didn't say.)

If you knew the mapping from bases to peptides you could calculate the different entropies of the individual bases, but without this information or other information about the relative frequencies of the bases you cannot do this.

(You said "15 proteins" but it seems clear you meant "15 peptides".)