Probability distribution of the number of heterozygous sites

111 Views Asked by At

We'll consider a stretch of DNA on a chromosome and we'll be looking at specific sites that are at certain distances on from the others. The distance between any two sites is express in centiMorgan (cM) (please start by reading the box below if more information is needed to understand what the cM unit is).

Consider a genotype with S=10 sites. The distance between site $n$ and site $n+1$ is given by the $n^{th}$ element of the vector $R$:

R = [0.1, 0.05, 0.1, 0.01, 0.01, 0.03, 0.2, 0.2, 0.05]

The vector $R$ is therefore of length $S-1$. Because, all sites are placed linearly on the same chromosome, the distance between site $n$ and site $n+\delta$ is the sum of the elements of the vectors between positions $n$ and $n+\delta - 1$ (inclusive).

Probability of being heterozygote at all sites

Throughout this post we will consider a case where both parents are clones and are heterozygotes for all sites. Let's calculate the probability for an offspring of these two parents to be heterozygous at all loci. The probability to be heterozygote at the first locus is $0.5$. Then the probability to be heterozygote at the second locus is the probability that either both parents recombine between the first and the second locus plus the probability the no parents recombine. And we can keep going with the same logic. So the probability for an offspring to be heterozygotes at all sites is $\frac{1}{2} \cdot \prod_{n=1}^{S-1} R_n^2 + (1-R_n)^2$, where $S$ is the number of sites and $R_n$ is the $n_{th}$ elements of the $R$ vector.

Probability of being heterozygote at exactly $x$ sites

Assuming both parents are clones and are heterozygotes at all sites, what is the probability that a given offspring is heterozygote at exactly x sites?

As discussed above, I already know (if I haven't made a mistake) what is this probability in the special case when $x = S$:

$$P(x=S) = \frac{1}{2} \cdot \prod_{n=1}^{S-1} R_n^2 + (1-R_n)^2$$


BOX - What is a centiMorgan?

If two sites are at 0.2 cM from each other, then the probability of a recombination event to occur between these two sites is 0.2. In other words, if two sites are at 0.5 cM from each other, it means that they are independently passed on to the offspring. For example consider two sites A and B. Let's consider a heterozygous (=which has two different alleles at a given locus) individuals for these two loci: A_1A_2 and B_1B_2 (A1 and B_1 are on the same chromosome and A_2 and B_2 are on different chromosomes). If the individual passes the allele A_1 to a given offspring, then this offspring has a probability 0.5 to pass the allele B_1 if the two sites are at 0.5 cM (independent sites) and a probability 0.1 to pass site B_1 if the two sites are at 0.1 cM.

1

There are 1 best solutions below

0
On

fix $k$ sites $S^1,\ldots, S^k$ what is the probability that the individual is heterozygous only at these sites?

First we have two arrays of passing genes (after recombination one array for each parent ($P_1, P_2$)).

Define the functions $\chi_{=}(n)$ and $\chi_{\neq}(n)$.If the site $j$ is to be heterozygous then $\chi_{=}(j) = 0$ and $\chi_{\neq}(j) = 1$.If the site $j$ is to be homozygous then $\chi_{=}(j) = 1$ and $\chi_{\neq}(j) = 0$

If at site $j-1$ they passed different genes, to obtain different genes at site $j$ if parent $P_1$ recombines at that site then parent $P_2$ has to also to recombine to maintain the difference in the genes and if parent $P_1$ doesn't switch at site $j$ then for the same reason parent $P_2$ can't switch its genes at $j$.

If at site $j-1$ they passed the same genes, to obtain different genes at site $j$ if parent $P_1$ recombines at that site then parent $P_2$ can't recombine to respect the difference in the genes and if parent $P_1$ doesn't switch at site $j$ then for the same reason parent $P_2$ must switch its genes at $j$.

Let the event $=_{i_1}\ldots, =_{i_k}$ $\neq_{j_1},\ldots, \neq_{j_{n-k}}$ represent the event where the sites $S^{i_1}, \ldots, S^{i_k}$ are homozygous and $S^{j_1}, \ldots, S^{j_{n-k}}$ are heterozygous

\begin{align*} &P(=_{i_1}\ldots, =_{i_k},\neq_{j_1},\ldots, \neq_{j_{n-k}}) =\\ &\frac{1}{2} \cdot \prod_{n=1}^{S-1} \chi_{ \neq }(n-1)\chi_{ \neq }(n)\bigg(R_n^2 + (1-R_n)^2\bigg) +\chi_{ = }(n-1)\chi_{ \neq }(n)\bigg( 2(1-R_n)R_n\bigg)+\chi_{ \neq }(n-1)\chi_{ = }(n)\bigg( 2(1-R_n)R_n\bigg) +\chi_{ = }(n-1)\chi_{ = }(n)\bigg(R_n^2 + (1-R_n)^2\bigg) \end{align*}

remark I am counting the sites $S$ sites from $0$ to $S-1$.

To find the probability of heterozygous at $k$ or ($x$) sites you need to consider all the ${S\choose k}$ different events corresponding to choosing the sites to be heterozygous and to sum the (possibly) different probabilities of those events.