Expected Value of Subset

751 Views Asked by At

Say we have $n>1000$ a number that can be divided by 10. We choose randomly a subset $S$ of the numbers ${1,2...n}$ of size $n/10$.

$X$ will be the amount of numbers is $S$ that can be divided by 5.

What is $E[X]$?

I'd be grateful for a hint'

Thanks in advance,

Yaron.

2

There are 2 best solutions below

0
On

Let $n=10q.$ By the Polya Enumeration Theorem we have $$\mathrm{E}[X] = {10q\choose q}^{-1}\times \left(\left. \frac{\partial}{\partial u} Z(P_q)\left(8q+2qu\right) \right|_{u=1}\right)$$ where $Z(P_q)$ is the cycle index of the set operator $\mathfrak{P}_{=q}.$

Now suppose we have a term $$a_1^{m_1} a_2^{m_2} a_3^{m_3}\cdots$$ with some leading coefficient in the cycle index $Z(P_q).$ Substituted this becomes $$(8q+2qu)^{m_1} (8q+2qu^2)^{m_2} (8q+2qu^3)^{m_3}\cdots$$ Differentiate to get $$(8q+2qu)^{m_1} (8q+2qu^2)^{m_2} (8q+2qu^3)^{m_3}\cdots \times \sum_{j} \frac{m_j(8q+2qu^j)^{m_j-1} 2qju^{j-1}}{(8q+2qu^j)^{m_j}}.$$

Finally put $u=1$ to obtain $$(10q)^{\sum_j m_j} \sum_j \frac{m_j(10q)^{m_j-1} 2qj}{(10q)^{m_j}} = \frac{1}{5} (10q)^{\sum_j m_j} \sum_j m_j j.$$

Note that $\sum_j m_j$ gives the number of cycles and $\sum_j m_j j$ is simply $q,$ so this is $$\frac{q}{5} (10q)^{\sum_j m_j}.$$

Now the OGF of the cycle index of the set operator $\mathfrak{P}_{=q}$ with the cycle count marked is $$G(z) = \exp\left(a_1 v z - a_2 v\frac{z^2}{2} + a_3 v\frac{z^3}{3} - a_4 v\frac{z^4}{4} \cdots\right) \\= \exp\left(\sum_{m\ge 1} (-1)^{m-1}\times v\times a_m\times \frac{z^m}{m}\right).$$

We seek $$[z^q] \exp\left(\sum_{m\ge 1} (-1)^{m-1} \times 10q\times \frac{z^m}{m}\right) \\ = [z^q] \exp\left(10q\times\log(1+z)\right) = [z^q] (1+z)^{10q} = {10q\choose q}.$$

Putting it all together we obtain $${10q\choose q}^{-1} \frac{q}{5} {10q\choose q} = \frac{q}{5}.$$

I do think this is remarkably pretty. Since we are selecting $q$ values from the $10q$ possibles and the presence of divisibles by five is $1/5$ this answer confirms the intuitive argument pointed out in the comments.

The following Maple code was used to verify the above computation. (The brute force method in the first function should not be used on $q>5.$)

with(combinat);

ex :=
proc(q)
    option remember;
    local gf, p, f, comb;

    gf := 0;

    p := firstcomb(q*10, q);
    while type(p, set) do
        f := nops(select(t -> t mod 5 = 0, p));
        gf := gf + u^f;

        p := nextcomb(p, q*10);
    od;

    gf;
end;


pet_cycleind_set :=
proc(n)
        local p, s;
        option remember;

        if n=0 then return 1; fi;

        expand(1/n*add((-1)^(l-1)*
        a[l]*pet_cycleind_set(n-l), l=1..n));
end;


pet_varinto_cind :=
proc(poly, ind)
local subs1, subs2, polyvars, indvars, v, pot, res;

    res := ind;

    polyvars := indets(poly);
    indvars := indets(ind);

    for v in indvars do
        pot := op(1, v);

        subs1 :=
        [seq(polyvars[k]=polyvars[k]^pot,
             k=1..nops(polyvars))];

        subs2 := [v=subs(subs1, poly)];

        res := subs(subs2, res);
    od;

    res;
end;

ex2 :=
proc(q)
    option remember;
    local bgf;

    bgf := 8*q+2*q*u;

    pet_varinto_cind(bgf, pet_cycleind_set(q));
end;

ex3 :=
proc(q)
    option remember;
    local term, cf, res, c, s, k, deg;

    if q=1 then return 2 fi;

    res := 0;
    for term in pet_cycleind_set(q) do
        c := subs([seq(a[k] = 10*q, k=1..q)], term);

        s := 0;
        for k to q do
            deg := degree(term, a[k]);
            if deg > 0 then
                s := s + deg*k;
            fi;
        od;

        res := res + c*s/5;
    od;

    res;
end;

There is some elementary material on $Z(P_q)$ at this MSE link.

0
On

The two key ideas here are Linearity of Expectation (i.e. that expectations of sums behave really nicely) and Indicator Variables (to turn your problem into one which can be described in terms of expectations).

Indicator Variables: Let $X$ be the number of elements in your set which are divisible by $5$. We can think of $X$ as $X_1+\dots+X_{n/5}$, where for each $i$ we have $$X_i=\left\{\begin{array}{ccl} 1 &\textrm{if}& 5i \textrm{ is in the set } \\0 &\textrm{if}& 5i \textrm{ is not in the set} \end{array}\right.$$ The idea here is that the distribution of the $X_i$ are easy to understand -- since the set has size $n/10$, each individual number is in the set with probability $1/10$. So we have $$P(X_i=1)=\frac{1}{10}, \, \, \, \, \, \, \, \, P(X_i=0)=\frac{9}{10}.$$ Now, the $X_i$ are not independent (e.g. if $X_1=1$, then it's slightly less likely that $X_2=1$, as $5$ being in the set makes it so there's less room for $10$ to be in the set). At first glance this makes things difficult, because it makes the distribution of $X$ rather more complicated than the way. But it turns out to be nicer than you might think, because of

Linearity of Expectation: This refers to the fact that the expectation of a sum is the sum of the expectations, i.e. $$E(X_1+\dots+X_k) = E(X_1)+E(X_2)+\dots+E(X_k)$$ even if the variables are not independent. So as long as we only care about the expectations, independence doesn't matter! In your case you should be able to compute $E(X_1)$ directly from the distribution of $X_1$, and then use the above fact to compute $E(X)$ as well.