Markov chain in Jukes-Cantor model

208 Views Asked by At

This question concerns the Jukes-Cantor model with respect to DNA nucleotides (ACGT). In any DNA string, consider a single nucleotide subject to mutation with a rate P for some time period. I'm tasked to derive the probability equation that the position will remain unchanged over a period of T time units:

P = 1/4(1+3e^-4PT)

taking into account that it is possible to mutate in between the initial and final time and return to the original nucleotide

Background:

Assume equal rates of change from any letter to any other. Thus we can construct a Markov transition matrix:

     A     C    G     T
A  (1-3P)  P    P     P 
C    P   (1-3P) P     P
G    P     P  (1-3P)  P
T    P     P    P  (1-3P)

We can see that the probability that a nucleotide will remain the same,

At time 1: P(1) = (1-3P) [no mutation]

At time 2: P(2) = (1-3P)(1-3P) + PP [no mutation 2 in a row or mutate and return]

or P(2)= (1-3P)P(1) + P*(1-P(1))

In general, P(t+1) = (1-3P)P(t) + P(1-P(t)) = (1-3P)^(t+1) + P(1-(1-3P)^t)

Please note the two uses of P here, one indication probability(at some time) - P(t) - and one indicating the rate of mutation - P. Using the constraint that P(0) = 1, that is, the probability of not mutating at time 0 is 1, it should be possible to rearrange/derive/use the slope formula/take the limit of this relationship and come up with the final equation.

1

There are 1 best solutions below

2
On

Your general equation is correct but your case of it for $P(2)$ is wrong because you have missed that $1-P(1)=3P$, so that $P(2)=(1-3P)^2+3P^2$.

Anyway, you get the general recursion:

$$P(t+1)=(1-3P)P(t)+P(1-P(t))=(1-4P)P(t)+P.$$

Moving to the other side:

$$P(t+1)+(4P-1)P(t)=P.$$

This is a linear recurrence relation with constant coefficients and a polynomial inhomogeneity. You can solve it by adding together the homogeneous solution and a particular solution, choosing the appropriate multiple of the homogeneous solution to suit $P(0)=1$. You can find a particular solution by guessing that a constant is a solution. (You can also infer a particular solution by just noting that the stationary distribution is a solution.)

When I do this I get something different from what you wrote (though it does tend to $1/4$ as $T \to \infty$). Are you sure that the model you're considering is discrete time rather than continuous time?