the probability for a file to be completed under random distribution

79 Views Asked by At

Suppose there are $n$ files, for ease lets assume all are of the same size $S$. Further suppose that each second, you obtain some fraction of size $k<<S$ (packet size) that is a part of one of the $n$ files, A file is completed after $t$ seconds if all of its $S/k$ packets were obtained.
How do I calculate the probability that for one specific file (say file $i$) is completed after $T$ seconds. By one specific file I meant some file to clarify, The settings is similar to how torrent p2p network works.

1

There are 1 best solutions below

8
On BEST ANSWER

This can be solved using dynamic programming. I've taken liberty to redefine terminology a little. Suppose we have $n$ files, each of size $k$ [information units]. I assume that we source only absent pieces, so each second we receive a random new $1$ information unit on one of files, all pieces are equiprobable.

Since you're interested in one specific file, we can treat it as having file $1$ of size $k$ and file $2$ of size $(n-1)k$. Denote current state at time $t$ as $\{X_1, X_2\}$, where $X_1$ is size of file $1$ and $X_2$ is size of file $2$.

Then, at time $t + 1$ your state becomes $\{X_1 + 1, X_2\}$ with probability $$\frac{k - X_1}{nk-(X_1 + X_2)},$$

or $\{X_1, X_2 + 1\}$ with probability $$\frac{(n - 1)k - X_2}{nk - (x_1 + X_2)}.$$

You start at $\{0, 0\}$, and have to find $\mathrm{P}_{\{k - 1, T - k\}}\cdot\mathrm{P}_{\{k - 1, T - k\}\rightarrow\{k, T - k\}}$.

ADDITION: Turns out a closed form might be obtained. Probability of finishing file $1$ exactly at time $T$ is

$$ \binom{T - 1}{T - k}\Biggr/\binom{nk}{k} $$

ADDITION: If you're interested in time $T$ when any file completes (I assume you want the time of first occurence only) - this is more complex, as we can't simplify it to 2-files case.

The dynamic programming approach still works, though the calculations become much heavier. In any case, at discrete time moment $t$ you have state $\{x_1, x_2, \ldots, x_n\}, \sum_i x_i = t$, where $x_i$ - size of file $i$ at time $t$. In general, for each $x_i < k$ you can make transition

$$ \{x_1, \ldots, x_i, \ldots, x_n\} \rightarrow \{x_1, \ldots, x_i + 1, \ldots, x_n\} \qquad \mathrm{with}\:\mathrm{probability} \qquad \frac{k - x_i}{nk - t}. $$

At step $T - 1$, if you have states $S$ from the set $\mathbb{S}_{T - 1}$ of all possible states with at least one $x_i = k - 1$ - you can complete a file on the next step. Specifically, probability of completing file at step $T$ would be

$$ \sum_{S \in \mathbb{S}_{t - 1}: N(S) >= 1}\mathrm{Pr}(S)\cdot\frac{N(S)}{nk - (T - 1)}, $$

where $N(S)$ is how many files of size $k - 1$ is in state $S$.

This approach would demand reasonable time and memory space for small $n$ and $k$. On my potato computer it takes $2.7$ seconds to process case of $5$ files of $10$ pieces each, and $93$ seconds to process $5$ files of $20$ pieces each.

Now, there might be a chance to solve it analytically, but I haven't reached a closed form yet. Probability of being in a state $\{x_1, x_2, \ldots, x_n\}$ at time $t$ might be calculated as

$$ \frac{\biggl(k\cdot(k - 1)\cdot\ldots\cdot(k - x_1 + 1)\biggr)\cdot\ldots\cdot\biggl(k\cdot(k - 1)\cdot\ldots\cdot(k - x_n + 1)\biggr)}{\biggl((nk)\cdot(nk - 1)\cdot\ldots\cdot(nk - t + 1)\biggr)}\cdot\frac{t!}{x_1!\cdot\ldots\cdot x_n!} = \ \ldots = \frac{\displaystyle{\frac{k!}{(k - x_1)!}}\cdot\displaystyle{\frac{k!}{(k - x_2)!}}\cdot\ldots\cdot\displaystyle{\frac{k!}{(k - x_n)!}}}{\displaystyle{\frac{(nk)!}{(nk - t)!}}}\cdot\frac{t!}{x_1!\cdot\ldots\cdot x_n!} $$

Next, probably, a sum of all possible combinations of $\mathrm{Pr}(S) \in \mathbb{S}_{t - 1}$ where there is $1, 2, \ldots$ states with $x_i = k - 1$ simplifies to a closed form, but I didn't reach it. Hope this helps.