Suppose we have a list of distinctive elements: $$X_0=\{x_1,x_2,x_3,\cdots,x_n\}$$ Each element has mass 1. Suppose we take two elements at random and make a new element with appropriate mass. For example we take $x_1$ and $x_n$ and we make a new elements $(x_{n+1})$ of mass two, we put the new element back in the list so, $$X_1=\{x_1,x_2,x_3,\cdots,x_n,x_{n+1}\}$$ Now if we repeat this procedure $t$ times we will have $$X_t=\{x_1,x_2,x_3,\cdots,x_n,x_{n+1},x_{n+2},x_{n+3},\cdots,x_{n+t}\}$$ Surely as the process of taking two elements and making a new one is random the masses also will be distributed randomly. Now I was wondering can anyone use probability theory and compute the mass of particle $x_i$ at a given time?
Probability of finding a random mass
252 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
One could try to solve the problem recursively. There may be better ideas and there may be a simple expression. This is just a thought.
Define $x_i \hat{=} x_{n+i}$. Denote the expectation of the mass of an element $x_i$ by $\mathbb{E}[x_i]$. Then the mass of $x_i$ is generated by two elements $x_j$ with $0 \leq j \leq i-1$. Denote the event that $x_{i-1}$ is chosen to calculate the mass of $x_i$ by $A_i$. Then $$\mathbb{E}[x_i] = \mathbb{E}[x_i | A_i] \mathbb{P}(A_i) + \mathbb{E}[x_i | A_i^c] \mathbb{P}(A_i^c)$$ We already know that if $x_{i-1}$ is not used to calculate the new mass then the expectation should stay the same, thus $\mathbb{E}[x_i | A_i^c] = \mathbb{E}[x_{i-1}]$. The probability of $A_i^c$ is also easily computed by $\mathbb{P}(A_i^c) = \frac{n+i-2}{n+i-1}\frac{n+i-3}{n+i-2}$ and $\mathbb{P}(A_i) = 1 -\mathbb{P}(A_i^c)$. $\mathbb{E}[x_i | A_i]$ can be calculated by $$ \mathbb{E}[x_i | A_i] = \mathbb{E}(x_{i-1}) + \mathbb{E}(X_{i-2}) = \mathbb{E}(x_{i-1}) + \frac{n + \sum_{j=1}^{i-2}{\mathbb{E}(x_j)}}{n+(i-2)}$$ Putting it together: $$ \mathbb{E}[x_i] = \mathbb{E}[x_{i-1}] + \frac{2 \sum_{j=1}^{n+i-2}\mathbb{E}(x_j)}{(n+i-1)(n+i-2)} $$
If we have two discrete random variables $X_1,\, X_2$, with $P(X_1=m)=p_1(m),\;P(X_2=m)=p_2(m)$, where $0 \le m \in \mathbb Z$, and $p_j(m)$ is the "Probability Mass Function (pmf)" of each of them, and consider the "Probability Generating Function (pgf)" of each, i.e. $$ F_{\,j} (z) = \sum\limits_{0\, \le \,k} {p_{\,j} (k)\,z^{\,k} } $$ then the pgf of the sum will be the product of the single pgf's $$ F_{\,1 + 2} (z) = F_{\,1} (z)\,F_{\,2} (z) = \sum\limits_{0\, \le \,k} {\left( {\sum\limits_{0\, \le \,l\,\left( { \le \,k} \right)} {p_{\,1} (l)p_{\,2} (k - l)} } \right)\,z^{\,k} } $$
We will have $$ \mathop {\lim }\limits_{z\, \to \,1\, - } F_{\,\Sigma \,2} (z) = F_{\,\Sigma \,2} (1^{\, - } ) = 1 $$ and, if it exists, $$ E\left( {X_{\,1} + X_{\,2} } \right) = \mathop {\lim }\limits_{z\, \to \,1\, - } F_{\,\Sigma \,2} '(z) = F_{\,\,1} '(1^{\, - } )F_{\,\,2} (1^{\, - } ) + F_{\,\,1} (1^{\, - } )F_{\,\,2} '(1^{\, - } ) = E\left( {X_{\,1} } \right) + E\left( {X_{\,2} } \right) $$
So if we have $q$ variables, among which we choose the pair $(i,j)$ to sum uniformly at random, e.g. from an urn, there are two common schemes that we can adopt
a) with replacement, which means that the couples with same elements are included, that we have a prob. $1/q^2$ to choose a specific couple, and that the pgf of the sum of all the possible couples will be $$ \bbox[lightyellow] { F_{\,\Sigma \,q} (z) = {1 \over {q^2 }}\sum\limits_{1\, \le \,i,\,j\, \le \,q} {F_{\,i} (z)\,F_{\,j} (z)} } \tag{1.a}$$
b) without replacement, that means that only couples with different elements are included, that the probability to choose a couple is $1/(q^2-q)=1/(q(q-1))$, and that the pgf of the sum of all couples will be $$ \bbox[lightyellow] { \eqalign{ & G_{\,\Sigma \,q} (z) = {1 \over {q\left( {q - 1} \right)}}\sum\limits_{1\, \le \,i \ne \,j\, \le \,q} {G_{\,i} (z)\,G_{\,j} (z)} = \cr & = {2 \over {q\left( {q - 1} \right)}}\sum\limits_{1\, \le \,i\, < \,j\, \le \,q} {G_{\,i} (z)\,G_{\,j} (z)} = \cr & = {1 \over {q\left( {q - 1} \right)}}\left( {\sum\limits_{1\, \le \,i,\,j\, \le \,q} {G_{\,i} (z)\,G_{\,j} (z)} - \sum\limits_{1\, \le \,i\, \le \,q} {G_{\,i} (z)^{\,2} } } \right) \cr} } \tag{1.b}$$
In our case we are starting with $n$ discrete variables whose pmf is given by
$$ \delta (m - 1) = \left[ {1 = m} \right] $$ where $\delta$ denotes the Kronecker's Delta and $[P]$ denotes the Iverson bracket.
The pgf's are identical and equal to $z$.
Then, the first particle generated from the original $n$ will have a spread of values whose pmf $$ \bbox[lightyellow] { \eqalign{ & F_{\,n + 1} (z) = {1 \over {n^2 }}\sum\limits_{1\, \le \,i,\,j\, \le \,n} {F_{\,i} (z)\,F_{\,j} (z)} = \cr & = {1 \over {n^{\,2} }}\sum\limits_{1\, \le \,i\, \le \,n} {F_{\,i} (z)\sum\limits_{1\, \le \,i\, \le \,n} {\,F_{\,j} (z)} } = {1 \over {n^{\,2} }}\left( {n\,F_{\,1} (z)} \right)^{\,2} = \cr & = z^{\,2} \cr} } \tag{2.a}$$ and for case b) $$ \bbox[lightyellow] { \eqalign{ & G_{\,n + 1} (z) = {1 \over {n\left( {n - 1} \right)}}\left( {\sum\limits_{1\, \le \,i,\,j\, \le \,n} {G_{\,i} (z)\,G_{\,j} (z)} - \sum\limits_{1\, \le \,i\, \le \,n} {G_{\,i} (z)^{\,2} } } \right) = \cr & = {1 \over {n\left( {n - 1} \right)}}\left( {n^{\,2} z^{\,2} - n\,z^{\,2} } \right) = z^{\,2} \cr} } \tag{2.b}$$ which is obvious since the sum can only give $2$ in both cases, and proceeding with $$ \bbox[lightyellow] { \eqalign{ & F_{\,n + 2} (z) = {1 \over {\left( {n + 1} \right)^2 }}\sum\limits_{1\, \le \,i,\,j\, \le \,n + 1} {F_{\,i} (z)\,F_{\,j} (z)} = \cr & = {1 \over {\left( {n + 1} \right)^2 }}\sum\limits_{1\, \le \,i\, \le \,n + 1} {F_{\,i} (z)\sum\limits_{1\, \le \,\,j\, \le \,n + 1} {\,F_{\,j} (z)} } = \cr & = {1 \over {\left( {n + 1} \right)^2 }}\left( {n\,z + z^2 } \right)^2 \cr} } \tag{3.a}$$ and $$ \bbox[lightyellow] { \eqalign{ & G_{\,n + 2} (z) = {1 \over {n\left( {n - 1} \right)}}\left( {\sum\limits_{1\, \le \,i,\,j\, \le \,n + 1} {G_{\,i} (z)\,G_{\,j} (z)} - \sum\limits_{1\, \le \,i\, \le \,n + 1} {G_{\,i} (z)^{\,2} } } \right) = \cr & = {1 \over {n\left( {n - 1} \right)}}\left( {\left( {n\,z + z^2 } \right)^2 - \left( {n\,z^{\,2} + z^4 } \right)} \right) = \cr & = {1 \over {n\left( {n - 1} \right)}}\left( {n\left( {n - 1} \right)\,z^2 + 2n\,z^3 } \right) \cr} } \tag{3.b}$$
After that the recurrence is clear, and for case a), it is $$ \bbox[lightyellow] { \begin{array}{l} F_{\,n + \,t + 1} (z) = F_{\,\left[ {n,\,\,t + 1} \right]} (z)\quad \left| {\;0 \le t} \right.\quad = \\ = \sum\limits_{1\, \le \,i,\,j\, \le \,n + t} {\frac{1}{{\left( {n + t} \right)^{\,2} }}F_{\,i} (z)F_{\,j} (z)} = \\ = \frac{1}{{\left( {n + t} \right)^{\,2} }}\left( \begin{array}{l} \sum\limits_{\left\{ {\begin{array}{*{20}c} {\,1\, \le \,i \le \,n} \\ {1\, \le \,j \le \,n} \\ \end{array}} \right.} {F_{\,i} (z)F_{\,j} (z)} + \sum\limits_{\left\{ {\begin{array}{*{20}c} {\,n + 1\, \le \,i \le \,n + t} \\ {\,1\, \le \,j \le \,n} \\ \end{array}} \right.} {F_{\,i} (z)F_{\,j} (z)} + \\ + \sum\limits_{\,\left\{ {\begin{array}{*{20}c} {\,1\, \le \,i \le \,n} \\ {\,n + 1\, \le \,j \le \,n + t} \\ \end{array}} \right.} {F_{\,i} (z)F_{\,j} (z)} + \sum\limits_{\left\{ {\begin{array}{*{20}c} {\,n + 1\, \le \,i \le \,n + t} \\ {\,n + 1\, \le \,j \le \,n + t} \\ \end{array}} \right.} {F_{\,i} (z)F_{\,j} (z)} \\ \end{array} \right) = \\ = \frac{1}{{\left( {n + t} \right)^{\,2} }}\left( {n^{\,2} z^{\,2} + 2n\,z\sum\limits_{\,1\, \le \,k \le \,t} {F_{\,\left[ {n,k} \right]} (z)} + \sum\limits_{\,1\, \le \,k \le \,t} {F_{\,\left[ {n,k} \right]} (z)} \sum\limits_{\,1\, \le \,k \le \,t} {F_{\,\left[ {n,k} \right]} (z)} } \right) = \\ = \frac{1}{{\left( {n + t} \right)^{\,2} }}\left( {n^{\,2} z^{\,2} + 2n\,z\sum\limits_{\,1\, \le \,k \le \,t} {F_{\,\left[ {n,k} \right]} (z)} + \left( {\sum\limits_{\,1\, \le \,k \le \,t} {F_{\,\left[ {n,k} \right]} (z)} } \right)^{\,2} } \right) = \\ = \frac{1}{{\left( {n + t} \right)^{\,2} }}\left( {n\,z + \sum\limits_{\,1\, \le \,k \le \,t} {F_{\,\left[ {n,k} \right]} (z)} } \right)^{\,2} = \\ \;\left| {\;1 \le t} \right. \\ = \frac{1}{{\left( {n + t} \right)^{\,2} }}\left( {n\,z + \sum\limits_{\,1\, \le \,k \le \,t - 1} {F_{\,\left[ {n,\,k} \right]} (z)} + F_{\,\left[ {n,\,t} \right]} (z)} \right)^{\,2} = \\ = \frac{1}{{\left( {n + t} \right)^{\,2} }}\left( {\frac{{\left( {n + t - 1} \right)^{\,2} }} {{\left( {n + t - 1} \right)^{\,2} }}\left( {n\,z + \sum\limits_{\,1\, \le \,k \le \,t - 1} {F_{\,\left[ {n,\,k} \right]} (z)} } \right)^{\,2} + F_{\,\left[ {n,\,t} \right]} (z)^{\,2} + 2F_{\,\left[ {n,\,t} \right]} (z)\frac{{\left( {n + t - 1} \right)}} {{\left( {n + t - 1} \right)}}\left( {n\,z + \sum\limits_{\,1\, \le \,k \le \,t - 1} {F_{\,\left[ {n,k} \right]} (z)} } \right)} \right) = \\ = \frac{{F_{\,\left[ {n,\,t} \right]} (z)}}{{\left( {n + t} \right)^{\,2} }}\left( {\left( {n + t - 1} \right)^{\,2} + F_{\,\left[ {n,\,t} \right]} (z) + 2\left( {n + t - 1} \right)\sqrt {F_{\,\left[ {n,\,t} \right]} (z)} } \right) = \\ = \frac{{F_{\,\left[ {n,\,t} \right]} (z)}}{{\left( {n + t} \right)^{\,2} }}\left( {\left( {n + t - 1} \right) + \sqrt {F_{\,\left[ {n,\,t} \right]} (z)} } \right)^{\,2} \\ \end{array} } \tag{4}$$ i.e., the particle generated at step $t$ will have a spread of values, whose pmf corresponds to the pgf $$ \bbox[lightyellow] { F_{\,n + \,t} (z) = F_{\,\left[ {n,\,\,t} \right]} (z) = \left\{ {\begin{array}{*{20}c} z & {\left| {\;0 = t} \right.} \\ {z^{\,2} } & {\left| {\;1 = t} \right.} \\ \begin{array}{l} \frac{1}{{\left( {n + t - 1} \right)^{\,2} }}\left( {n\,z + \sum\limits_{\,1\, \le \,k \le \,t - 1} {F_{\,\left[ {n,k} \right]} (z)} } \right)^{\,2} = \\ = \frac{{F_{\,\left[ {n,\;t - 1} \right]} (z)}}{{\left( {n + t - 1} \right)^{\,2} }}\left( {\left( {n + t - 2} \right) + \sqrt {F_{\,\left[ {n,\;t - 1} \right]} (z)} } \right)^{\,2} \\ \end{array} & {\left| {\;2 \le t} \right.} \\ \end{array}} \right. } \tag{5.a}$$ where the particular notation on the index of $F$ is just to help and separate the original masses from those generated therefrom.
Similarly, it is not difficult to arrive to get $$ \bbox[lightyellow] { {G_{\,\left[ {n,\,\,t} \right]} (z) =} = \left\{ {\begin{array}{*{20}c} z & {\left| {\;0 = t} \right.} \\ {z^{\,2} } & {\left| {\;1 = t} \right.} \\ \begin{array}{l} \frac{1}{{\left( {n + t - 1} \right)\left( {n + t - 2} \right)}}\left( {\left( {n\,z + \sum\limits_{\,1\, \le \,k \le \,\,t - 1} {G_{\,\left[ {n,\;k} \right]} (z)} } \right)^{\,2} - nz^{\,2} - \sum\limits_{\,1\, \le \,k \le \,\,t - 1} {G_{\,\left[ {n,\,\,k} \right]} (z)^{\,2} } } \right) = \\ = \frac{{2\,G_{\,\left[ {n,\;t - 1} \right]} (z)}}{{\left( {n + t - 1} \right)\left( {n + t - 2} \right)}}\left( {\left( \begin{array}{c} n + t - 2 \\ 2 \\ \end{array} \right) + n\,z + \sum\limits_{\,1\, \le \,k \le \,\,t - 2} {G_{\,\left[ {n,\;k} \right]} (z)} } \right) \\ \end{array} & {\left| {\;2 \le t} \right.} \\ \end{array}} \right. } \tag{5.b}$$
Taking for example some first few values, we get $$ \eqalign{ & F_{[2,\, 2]} = 1/9 z^2 (z + 2)^2 \cr & F_{[2,\, 3]} = 1/1296 z^2 (z + 2)^2 (2 z + z^2 + 9)^2 \cr & = 1/1296 (z^8 + 8*z^7 + 42*z^6 + 140*z^5 + 313*z^4 + 468*z^3 + 324*z^2) \cr & F_{[2,\, 4]} = 1/41990400 z^2 (z + 2)^2 (18 z + 13 z^2 + 4 z^3 + z^4 + 144)^2 (2 z + z^2 + 9)^2 \cr } $$
$$ \eqalign{ & F_{[3,\, 2]} = 1/16 z^2 (z + 3)^2 \cr & F_{[3,\, 3]} = z^2 (z + 3)^2 (3 z + z^2 + 16)^2 \cr & = 1/6400 (z^8 + 12*z^7 + 86*z^6 + 396*z^5 + 1201*z^4 + 2400*z^3 + 2304*z^2) \cr & F_{[3,\, 4]} = 1/1474560000 z^2 (z + 3)^2 (3 z + z^2 + 16)^2 (48 z + 25 z^2 + 6 z^3 + z^4 + 400)^2 \cr } $$ and the following sketch of the pmf for $n=2$ and a few values of $t$. Note that the $x$ is scaled by $1/2^t$.
Finally, taking $F'(1)$ and $G'(1)$ we get for the expected value of particle $t$ the same recurrence and solution as follows $$ \bbox[lightyellow] { E_{\,\left[ {n,\,\,t} \right]} = \left\{ {\begin{array}{*{20}c} 1 & {\left| {\;0 = t} \right.} \\ 2 & {\left| {\;1 = t} \right.} \\ {\frac{2}{{\left( {n + t - 1} \right)}}\left( {n + \sum\limits_{\,1\, \le \,k \le \,\,t - 1} {E_{\,\left[ {n,\,\,k} \right]} } } \right)} & {\left| {\;2 \le t} \right.} \\ \end{array}} \right.\quad = \quad \left[ {0 = t} \right] + 2\left[ {1 \le t} \right]\left( {1 + \frac{{t - 1}}{{n + 1}}} \right) } \tag{6}$$ Upon some thought, the recursion comes out to be obvious. I was not expecting instead that the average value was increasing linearly with $t$