Convolution mixture of a probability generating function (population genetics)

299 Views Asked by At

I'm trying to work through an old population genetics paper (see here). The following model assumes an infinite number of nucleotide sites and no recombination between different sequences (so you can trace an ancestor through many generations). They are also assuming there are $N$ diploid individuals in every generation, and therefore $2N$ sequences in a population. For those interested, they are following the "Wright-Fisher" model.

I'm having a bit of trouble starting at the bottom of page 263 for those that are interested following along in the paper, rather than my attempt to explain it.

Choose $i$ gametes from generation $t$. The number of parents that produced these gametes is a random variable from $1\le J\le i$. When $J=1$, all the gametes had the same parent. When $J=i$, all the gametes had different parents. Denote the conditional distribution of $J$ as: $$P(J=j|i)=G_{i,j} \:\:j=1,2,3,...,i$$ Also, let $K_i^{(t)}$ denote the number of sites that differ between the i gametes chosen (also known as "segregating sites"). For example, the 2 sequences below: $$ABCDEFG\\ ABBDECG$$ Would have 2 segregating sites. Therefore $K_2^{(t)}=2$ Now, Assuming that the sequences sampled in generation $t$ came from $J$ distinct parents in generation $t-1$, We can write $$K_i^{(t)} = K_J^{(t-1)}+X_i^{(t)}\:\:\:\:\:(1)$$ where $K_J^{(t-1)}$ was the number of segregating sites between the $J$ parents in generation $t-1$ and $X_i^{(t)}$ are the number of new segregating sites in offspring due to mutation (remember, the assumption that we have "infinite sites", so no mutation occurs at the same nucleotide twice).

We consider the number of mutations to occur by a poisson process with mean $i\nu$ (where $i$ is the number of sequences sampled and $\nu$ is mutation rate per sequence per generation). Using this information, we can write the probability generating function for $K_i^{(t)}$ as a convolution mixture: $$E[s^{K_i^{(t)}}] = \sum_{j=1}^iG_{i,j}E[s^{K_j^{(t-1)}}] e^{iv(s-1)}\:\:\: (2)$$

The derivative of $(2)$ is giving me a bit of trouble. This is my reasoning so far:

I know that the probability generating function ($pgf$) of the sum of two random variables can be written as the product of their respective $pgf$s. I.E.: $$ Z=X+Y\\ E[s^X] = \sum_x P_X(X=x)s^x\\ E[s^Y] = \sum_y P_Y(Y=y)s^y\\ E[s^Z] = E[s^X]E[s^Y] = (\sum_x P_X(X=x)s^x) (\sum_y P_Y(Y=y)s^y) $$

Also, the poisson $pgf$ is $e^{\lambda (s-1)}$ (which takes care of one of the terms in eqn $(2)$ ).

which means the $pgf$ for $K_J^{(t-1)}$ is: $$ E[s^{K_J^{(t-1)}}] = \sum_{j=1}^iG_{i,j}E[s^{K_j^{(t-1)}}] = \sum_{j=1}^i P(J=j|i) E[s^{K_j^{(t-1)}}] $$

Which I don't understand. The problem states that they are assuming $J$ individuals from the previous generation, and then write the $pgf$ in terms of the probability of having any number of individuals time the $pgf$ of number of segregating sites in that number of individuals.

Apologies if this was poorly explained, I tried my best to make it as clear as I could. Please let me know if you need any clarification.

Any bit helps!

1

There are 1 best solutions below

2
On BEST ANSWER

I think the confusion arises from the meaning of $J$. When the authors assume there are $J$ individuals from the previous generation, the $J$ is still a random variable as opposed to a fixed number. So when you calculate the expectation $E[s^{K_J^{(t-1)}}]$ you need to remember that the random variable $J$ can take any value from $1$ to $i$. Thus the expectation is a sum over all values of $j$ from $1$ to $i$, with weights $P(J=j\mid i)=:G_{i,j}$.