Probability of 3 streams of data on same Compute Node

50 Views Asked by At

Assume there are 10 compute nodes and 6 sources of data which randomly assign a destination within 1:10 compute nodes. What is the probability that three sources (or more) go to the same compute node at the same time?

I tried using the 'birthday' paradox Poisson concept and came up with the following: For [n=1] you have (10/10) or 100% probability that a single stream will be sent to a compute node

For [n=2] you have (10/10)*(1/10) or 10% probability that a two streams will be sent to the same compute node

For [n=3] you have (10/10)(1/10)(2/10) or 2% probability that three streams will be sent to the same compute node

Is this logic correct with a 2% probability?

2

There are 2 best solutions below

2
On BEST ANSWER

It's possible to approach this problem in more than one way, but I think the following will be useful for more complicated problems despite its somewhat tedious recitation of steps. At the end I've added an alternative solution that closely resembles the approach suggested in a Comment by the OP above.

The problem is equivalent to asking the chance that some (compute) node will get assigned at least three (data) sources. Alternatively, we could work with the probability of the complementary outcome, that no node gets more than two sources (where random uniform and independent assignments are understood). For the sake of clarity we will consider the former outcomes "success" and the latter outcomes "failure".

As you intuitively started, we can consider the six source assignments to nodes as if they were decided serially. In particular the first assignment will give us "with probability one" that some node has one source (and the rest of them have as yet none):

$$ 1+0+0+0+0+0+0+0+0+0 = 1 \tag{1}$$

where we sort the nodes by descending numbers of sources assigned.

Thinking of this as a state with probability one after the first assignment, we have only five more assignments to make. The second assignment could give us two different outcomes. If we hit the same node again, it will look like:

$$ 2+0+0+0+0+0+0+0+0+0 = 2 \tag{2a}$$

which happens $0.1$ of the time, but if we hit a different node, it will look like:

$$ 1+1+0+0+0+0+0+0+0+0 = 2 \tag{2b}$$

and that would occur $0.9$ of the time.

The third assignment would give any of the following results:

$$ 3+0+0+0+0+0+0+0+0+0 = 3 \tag{3a}$$

$$ 2+1+0+0+0+0+0+0+0+0 = 3 \tag{3b}$$

$$ 1+1+1+0+0+0+0+0+0+0 = 3 \tag{3c}$$

We can figure out the probabilities of these states by considering how likely the states $2a,2b$ are to result in each of them. The chance of going from $2a$ to $3a$ is $0.1$, and this is the only way to arrive at $3a$. Since the probability of state $2a$ was $0.1$, the probability of arriving at $3a$ is $0.01$. Moreover any further assignments cannot decrease the "at least three" number of sources at a node, so we can go ahead and lump all the "descendents" of $3a$ into the bucket of "successful" assignments:

$$ \mathbf{Pr}(3a) = 0.01 $$

On the other hand state $2a$ could transition to state $3b$ with probability $0.9$, while state $2b$ transitions to state $3b$ with probability $0.2$. Therefore the combined possibility to reach state $3b$ is:

$$ \mathbf{Pr}(3b) = 0.27 $$

Then it should be obvious that the probability of reaching state $3c$ is:

$$ \mathbf{Pr}(3c) = 0.72 $$

either by direct computation (of transition from $2b$ to $3c$) or taking the complementary event to the union of $3a$ and $3b$.

Since we've taken state $3a$ off the table, let's consider only the states that would arise from $3b,3c$. These are:

$$ 3+1+0+0+0+0+0+0+0+0 = 4 \tag{4a}$$

$$ 2+2+0+0+0+0+0+0+0+0 = 4 \tag{4b}$$

$$ 2+1+1+0+0+0+0+0+0+0 = 4 \tag{4c}$$

$$ 1+1+1+1+0+0+0+0+0+0 = 4 \tag{4d}$$

From state $3b$ we can reach state $4a$ with probability $0.1$ or state $4b$ with probability $0.1$ or state $4c$ with probability $0.8$. From state $3c$ we can reach state $4c$ with probability $0.3$ or state $4d$ with probability $0.7$. So (having excluded descending from $3a$), we have these probabilities of reaching the node/source assignments above:

$$ \mathbf{Pr}(4a) = 0.027 $$

$$ \mathbf{Pr}(4b) = 0.027 $$

$$ \mathbf{Pr}(4c) = 0.432 $$

$$ \mathbf{Pr}(4d) = 0.504 $$

Because we excluded state $3a$ from further consideration (which had probability $0.01$), the above descendent probabilities add up only to $0.99$. Likewise we will exclude descendents of $4a$ because it already has at least three sources for one node. But keeping track of our successful outcomes, we already have probability $0.01 + 0.027 = 0.037$ after assigning four sources.

The states that can be reached from $4b,4c,4d$ are these:

$$ 3+2+0+0+0+0+0+0+0+0 = 5 \tag{5a}$$

$$ 3+1+1+0+0+0+0+0+0+0 = 5 \tag{5b}$$

$$ 2+2+1+0+0+0+0+0+0+0 = 5 \tag{5c}$$

$$ 2+1+1+1+0+0+0+0+0+0 = 5 \tag{5d}$$

$$ 1+1+1+1+1+0+0+0+0+0 = 5 \tag{5e}$$

We summarize the transition probabilities from $4b,4c,4d$ to $5a,5b,5c,5d,5e$ in this table:

$$ \begin{array}{r|ccccc} \text{from\to} & 5a & 5b & 5c & 5d & 5e \\ \hline 4b & 0.2 & & 0.8 & & \\ 4c & & 0.1 & 0.2 & 0.7 & \\ 4d & & & & 0.4 & 0.6 \end{array} $$

After multiplying the probabilities of states $4b,4c,4d$ by the above matrix, we get the probabilities of states $5a,5b,5c,5d,5e$:

$$ \mathbf{Pr}(5a) = 0.0054 $$

$$ \mathbf{Pr}(5b) = 0.0432 $$

$$ \mathbf{Pr}(5c) = 0.1080 $$

$$ \mathbf{Pr}(5d) = 0.5040 $$

$$ \mathbf{Pr}(5e) = 0.3024 $$

These add up to $0.963$ as expected, and we will remove state $5a,5b$ from further calculations because they've already reached at least three sources assigned to a node. For those keeping track of our probability of success, it is $0.037 + 0.0486 = 0.0856$ after five sources are assigned.

Finally we can obtain the following from states $5c,5d,5e$:

$$ 3+2+1+0+0+0+0+0+0+0 = 6 \tag{6a}$$

$$ 3+1+1+1+0+0+0+0+0+0 = 6 \tag{6b}$$

$$ 2+2+2+0+0+0+0+0+0+0 = 6 \tag{6c}$$

$$ 2+2+1+1+0+0+0+0+0+0 = 6 \tag{6d}$$

$$ 2+1+1+1+1+0+0+0+0+0 = 6 \tag{6e}$$

$$ 1+1+1+1+1+1+0+0+0+0 = 6 \tag{6f}$$

As we've reached the final stage of assignments here, all we really need to compute are the probabilities to reach the states $6a,6b$ since those two are the only ones that achieve success (at least three sources on a node).

Checking the possible transitions, only state $5c$ can give us state $6a$ with probability $0.2$, and only state $5d$ can give use state $6b$ with probability $0.1$. From state $5e$ (all ones and zeros) there isn't a way to reach success.

$$ \mathbf{Pr}(6a) = 0.0216 $$

$$ \mathbf{Pr}(6b) = 0.0504 $$

Combining these with our previous running total, the overall chance that at least one of the ten nodes will have at least three of the six sources is $0.0856 + 0.072 = 0.1576$.


Alternative Solution

The probability state transition approach above can be adapted to many more complicated problems. See e.g. this previous Question. But the present problem, with $10$ distinguishable nodes and $6$ distinguishable sources, can be solved with a combinatorial approach like what the OP outlined in a Comment.

As there we consider all $10^6$ possible assignments of sources to nodes equally likely, so that it suffices to count the failure cases (those in which no nodes is assigned more than two sources).

In the end these failures are the cases we labeled as states $6c,6d,6e,6f$. The simplest to count is $6f$, where we have six nodes each with one source assigned to them. Keep in mind the sources are distinct, so there are $10$ ways to assign the first source, then $9$ ways to assign the second source, and so on:

$$ \textbf{count}(6f) = 10\cdot 9\cdot 8\cdot 7\cdot 6\cdot 5 = 151200 $$

For state $6e$ we choose one node to get two sources assigned, and then four of the remaining nodes to get one source apiece. Thus:

$$ \textbf{count}(6e) = 10\cdot \binom{6}{2} \cdot 9\cdot 8\cdot 7\cdot 6 = 453600 $$

State $6d$ has two nodes with two sources and two nodes with one source. Thus:

$$ \textbf{count}(6d) = \binom{10}{2}\cdot \binom{6}{2}\cdot \binom{4}{2}\cdot 8\cdot 7 = 226800 $$

Finally state $6c$ requires a choice of three nodes, each getting a pair of sources. The count is:

$$ \textbf{count}(6c) = \binom{10}{3}\cdot \binom{6}{2}\cdot \binom{4}{2} = 10800 $$

The total count of failure cases is:

$$ 10800 + 226800 + 453600 + 151200 = 842400 $$

Subtracting these from $10^6$ gives us $157600$ success cases, so the probability of success is $0.1576$ (in agreement with the earlier calculation).

0
On

The following solution uses an exponential generating function (EGF).

It may be simpler to solve the complementary problem. What is the probability that all nodes have no more than 2 of the 6 sources assigned to them? The sources may be assigned to the nodes in $10^6$ possible ways, all of which we assume are equally likely. We would like to count the number of ways in which each node has 0, 1, or 2 sources assigned. More generally, we might count the number of ways that $r$ sources can be assigned. The EGF for the number of ways sources can be assigned to a single node is $1+x+(1/2!)x^2$, so the EGF for a sequence of $10$ such assignments is $$f(x) = \left( 1 + x + \frac{1}{2!} x^2 \right)^{10}$$ The EGF gives the number of ways any number of sources can be assigned to the nodes, but all we really need is the number of ways 6 sources can be assigned. Expanding $f(x)$, we find the coefficient of $x^6$ is $1,170$, so the number of ways 6 sources can be assigned to the nodes subject to the given constraints is $6! \times 1170 = 842,400$, and the associated probability is $$\frac{842,400}{10^6} = 0.8424$$ (I used a computer algebra system for the expansion, but I think that with a little work one could compute the coefficient with pencil and paper.)

So the answer to the original problem, the probability that at least one node has three or more sources assigned to it, is $$1 - 0.8424 = \boxed{0.1576}$$