Randomic distribution

Question

Randomic distribution

231 Views Asked by GionJh At 30 Jun 2025 - 9:11

Suppose I have X elements [A,B,C,D,...] and Y subsets of X of max length 4.

Suppose the subsets are build up randomly, is there a way to compute the average number of subsets an item of X is an element of ?

EDIT:

I have a file like this: where X are the numbers between 0 and 99
with 100 000 rows, which represent the subsets and I get that singletons are present in approximately 2500 rows
i.e. E(W) = 100000 * 0.025
while according to the formula it should be : E(W) = 1000000 * 0,040

82 73 62 
21 49 11 
28 
54 5 10 
42 
31 35 
89 5 
42 88 9 
62 
15 77 72 83 
33 83 
0 
73 88 
84 10 
61 98 78 
22 13 6 
10 66 
76 34 8 0 
92 39 
52 97 14 
95 7 46 6 
79 30 
74 29 73 49 
23 83 
63 
99 18 68 10 
59 64 8 52 
21 85

code snippet:

        int j = 0; // row counter
        while(j < 100000)
        {
            List<String> list = new ArrayList<String>();
            int max = r.nextInt(4) + 1; // gives me an int between 1 and 4
            for(int jj = 0;jj <max;jj++) // cycle to generate subset
            {
                int rand;

                // produce random number between 0 and 99
                do
                {   
                 rand = r.nextInt(100);
                }
                while(!notPresent(""+rand,list)); // retry if it's been already inserted in the list  

                list.add(""+rand);
                writer.print(rand+" "); // write set element
            }

     writer.println();  // new line character
     j++;   
    }

Original Q&A

There are 1 best solutions below

**André Nicolas** · Accepted Answer

We change notation a little. Let $A$ be an $N$-element set. We choose at random a multiset $C$ of $m$ subsets of $A$, each of cardinality $\le 4$, with all choices equally likely.

Let $a$ be a specific element of $A$, and let random variable $W$ be the number of sets in $C$ that contain $a$. Let $U_i=1$ if on the $i$-th choice of a set, our chosen set contained $a$, and let $U_i=0$ otherwise. Then $W=U_1+\cdots+U_m$. By the linearity of expectation we have $$E(W)=E(U_1)+E(U_2)+\cdots +E(U_m)=mE(U_1).$$

So now all we need is to find $E(U_1)$, that is, the probability that $U_1=1$.

There are $\binom{N}{4}$ possible sets of $4$, $\binom{N}{3}$ sets of $3$, and so on, down to the empty set if you wish, but I will (arbitrarily) not allow that as a choice.

The probability than a $4$-set contains $a$ is $4/N$, and the probability that a $3$-set contains $a$ is $3/N$, and so on. Thus $$E(U_1)=\frac{(4/N)\binom{N}{4}+(3/N)\binom{N}{3}+(2/N)\binom{N}{2}+(1/N)\binom{N}{1}}{\binom{N}{4}+\binom{N}{3}+\binom{N}{2}+\binom{N}{1}}.$$ The numerator simplifies to $\binom{N}{3}+\binom{N}{2}+\binom{N}{1}+1$. Putting things together we find that $$E(W)=m\cdot \frac{ \binom{N}{3}+\binom{N}{2}+\binom{N}{1}+1}{\binom{N}{4}+\binom{N}{3}+\binom{N}{2}+\binom{N}{1}}.$$

Remark: You might wish to change the model in various ways. The basic technique detailed above probably can, with small changes, still be used.

In particular, it may be that for your application you are interested not in subsets, but rather in sequences of length $\le 4$. The basic analysis will be much the same, with small differences of detail, powers instead of binomial coefficients.

Added: If the size of the set is randomly chosen to be between $1$ and $4$, then $E(U_1)$ changes. We use a conditional expectation argument.

If the size is $1$, then the expectation is $\frac{1}{N}$, if the size is $2$ the expectation is $\frac{2}{N}$ and so on, so $$E(U_1)=\frac{10}{4N}.$$

Randomic distribution

There are 1 best solutions below

Related Questions in ELEMENTARY-SET-THEORY

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in RANDOM

Trending Questions

Popular # Hahtags

Popular Questions