Solving recurrence equation for lossy duplication

30 Views Asked by At

Suppose we have a word of length $L$ from a two-letter alphabet (say, $\mathcal{A} = \{A,B\}$), and we duplicate it. However, our duplication is fallible: each element of the result is incorrect ($A$ changed to $B$, or vice versa) with probability $\mu$. Now we have a set of two words: the original, and the lossy copy. We duplicate both of these words by the same procedure, and repeat for $n$ rounds so that we have a set of $2^N$ (not necessarily unique) words.

I'm interested in what the final set of words looks like after $N$ rounds. Ultimately, I'd like to find the joint probability distribution for the number of copies we have of each possible word in $\mathcal{A}^L$. But I decided to start with something easier by finding the expectation of this distribution.

Let $f_n(x)$ be the expected number of copies of word $x \in \mathcal{A}^L$ after the $n$th round of duplication, and let the distance $d(x,y)$ between two words $x, y \in \mathcal{A}^L$ be the number of letters at which they differ. Then we have the following recurrence relation:

$$f_{n+1}(x) = f_n(x) + \sum_{y \in \mathcal{A}^L} f_n(y) \mu^{d(x,y)} (1-\mu)^{L-d(x,y)}$$

I'd like to be able to turn this into an explicit formula for $f_n(x)$ in terms of $n$, $x$, and $L$. Unfortunately, if I ever learned how to solve problems of this sort I've long since forgotten, and I don't even know what field of mathematics I'm looking for. As such, and since I'm hoping to build on this solution to solve more complicated versions of this and related problems, I would be very grateful for answers which offer thorough explanation of the methods and/or references to where I could learn more.

1

There are 1 best solutions below

0
On

This is a comment, not an answer.

These types of sums can be made less complicated like this:

$\begin{array}\\ f_{n+1}(x) &= f_n(x) + \sum_{y \in \mathcal{A}^L} f_n(y) \mu^{d(x,y)} (1-\mu)^{L-d(x,y)}\\ &= f_n(x) + \sum_{y \in \mathcal{A}^L} f_n(y) \mu^{d(x,y)} (1-\mu)^{L}(1-\mu)^{-d(x,y)}\\ &= f_n(x) + (1-\mu)^{L}\sum_{y \in \mathcal{A}^L} f_n(y) (\frac{\mu}{1-\mu})^{d(x,y)} \\ &= f_n(x) + (1-\mu)^{L}\sum_{y \in \mathcal{A}^L} f_n(y) z^{d(x,y)} \quad\text{where } z = (\frac{\mu}{1-\mu})\\ \end{array} $

To actually work on the sum, you might convert it to a sum over the number of positions in the string.