Suppose there are 1000 elements, we randomly pick two sets of 50 elements. What's the expectation of number of elements in intersection

156 Views Asked by At

I'm able to write the distribution of the number of elements in the intersection. But is there any faster way to get their average/expectation?

2

There are 2 best solutions below

0
On BEST ANSWER

We can do it by linearity of expectation.

For each element $x$ the probability $x$ is in the intersection is simply the probability it is in both sets, which is $\big(\frac{50}{1000}\big)^2$.

Therefore the expected number of such elements is $1000\times \big(\frac{50}{1000}\big)^2=2.5$.

0
On

Probability to get one element in intersection is $ [\binom{50}{1}*\binom{950}{49}] /\binom{1000}{50}$

Similarly for two elements it will be $ [\binom{50}{2}*\binom{950}{48}] /\binom{1000}{50}$

But is there any faster way to get their average/expectation?

If we can code it, that would be pretty fast. For a logic that is simpler and faster, I would welcome answers.

I have used python 3.5 for the code.

from math import factorial
import numpy as np

def nCr(n,r):
    # calculate the N choose r ways.
    f = factorial
    return f(n)/(f(r)*f(n-r))


def ExpectedElements(initialDraws, totalElements):
    # 

    sampleSpacesize = initialDraws

    remElements = totalElements - initialDraws
    listExpectation = [] # an emppty list that stores P(x)*X

    for i in range(1, sampleSpacesize + 1):
    # Expected number of elements = sum(probability of getting that no. of 
    # elements*number of elements)

        listExpectation.append(i*(nCr(sampleSpacesize, i)*nCr(remElements, 
             initialDraws - i)/nCr(totalElements, initialDraws)))


    listExpectation = np.array(listExpectation)   
    print(listExpectation.sum())
    print(listExpectation[0])
    return(listExpectation.sum())



ExpectedElements(50, 1000)  
ExpectedElements(100, 1000)       
ExpectedElements(200, 1000)             
ExpectedElements(500, 1000)