Computing Variance and Expected Value With Random Variables

181 Views Asked by At

Consider tossing an unbiased coin until we see exactly k heads. Let Y be the random variable corresponding to the total number of coin tosses required.

My instructor has given the following hint: Let $Y = X1+X2+. . .+Xk$ where $Xi$ is the number of extra coin tosses required after the (i − 1)th head is observed until the ith head is observed.

  • Compute $E(Y)$

    I'm thinking of using a summation of sorts to do here. Perhaps $\sum{Xi}\frac{1}{2}$? I'm trying to figure out how to imply the hint my professor gave into this. I also notice that each $Xi$ is the same type of random variable. I'm thinking to use something around the Linearity of Expectation to determine the answer.

  • Compute var(Y)

    If I can get E(Y), this is incredibly straight-forward.

  • Define Z = Y /k: Compute E(Z) and var(Z)

    I'm assuming that it would be the same process as E(Y), except I would replace a variable in the summation with Y/k.

  • Use the Chebyshev bound to prove a bound on P(|Z − E(Z)| ≥ 2) in terms of k.

    I currently have that $P(|Z-\mu|≥k\sigma) \,≤\,\frac {1}{k^2}$, however, I am stuck with obtaining the proper variance for this. This means that $\sigma = E(Z)$ but I cannot figure out the other variables.

Everything I've attempted so far is incorrect merely because I am incorrectly interpreting different statistical rules. Work would be highly appreciated in assistance to figuring these out.

1

There are 1 best solutions below

0
On BEST ANSWER

Here are some pointers that may help the OP to get started. We have from first principles that the probability of needing $T$ samples that it is given by

$$P[T=m] = \frac{1}{2^m} {m-1\choose k-1}.$$

Let us verify that this is a probability distribution:

$$\sum_{m\ge k} P[T=m] = \sum_{m\ge k} \frac{1}{2^m} {m-1\choose k-1} = \frac{1}{2^k} \sum_{m\ge 0} \frac{1}{2^m} {m+k-1\choose k-1} \\ = \frac{1}{2^k} \sum_{m\ge 0} \frac{1}{2^m} [z^m] \frac{1}{(1-z)^k} = \frac{1}{2^k} \frac{1}{(1/2)^k} = 1$$

and the sanity check goes through. We get for the factorial moments

$$\mathrm{E}[T(T-1)\ldots (T-(\ell-1))] = \sum_{m\ge k} {m\choose \ell} \ell! P[T=m] \\ = \ell! \sum_{m\ge k} {m\choose \ell} \frac{1}{2^m} {m-1\choose k-1} = \ell! \frac{1}{2^k} \sum_{m\ge 0} \frac{1}{2^m} {m+k\choose \ell} {m+k-1\choose k-1} \\ = \ell! \frac{1}{2^k} \sum_{m\ge 0} \frac{1}{2^m} {m+k-1\choose k-1} [z^\ell] (1+z)^{m+k} \\ = \ell! [z^\ell] \frac{(1+z)^k}{2^k} \sum_{m\ge 0} \frac{1}{2^m} {m+k-1\choose k-1} (1+z)^{m} \\ = \ell! [z^\ell] \frac{(1+z)^k}{2^k} \frac{1}{(1-(1+z)/2)^k} = \ell! [z^\ell] \frac{(1+z)^k}{(1-z)^k}.$$

This is

$$\ell! \sum_{r=0}^\ell {k\choose r} {\ell-r+k-1\choose k-1}.$$

We thus obtain

$$\mathrm{E}[T] = 1\times {k\choose k-1} + k\times {k-1\choose k-1} = 2k$$

Furthermore

$$\mathrm{E}[T(T-1)] = 2\times 1\times {k+1\choose k-1} + 2\times k\times {k\choose k-1} + 2\times {k\choose 2}\times {k-1\choose k-1} \\ = (k+1)k + 2k^2 + k(k-1) = 4k^2.$$

This yields for the variance

$$\mathrm{Var}[T] = \mathrm{E}[T(T-1)] + \mathrm{E}[T] - \mathrm{E}[T]^2 = 4 k^2 + 2k - (2k)^2 = 2k.$$

Observe that when we consult e.g. Wikipedia on the negative binomial distribution we find that most entries like the German one and the English one present two varieties of this distribution, one of them where only the number of successes is counted i.e. the non-constant statistic, and another where we count the total number of trials. It appears this question asks for the latter, which is what we computed above.

The following extremely simple Perl script can be used to verify the factorial moments that were obtained and documents the interpretation of the question that was used.

#! /usr/bin/perl -w
#

MAIN: {
    my $k = shift || 5;
    my $l = shift || 1;
    my $trials = shift || 1000;

    print "$k $l $trials\n";

    my $data = 0;

    for(my $tind = 0; $tind < $trials; $tind++){
        my $seen = 0; my $steps = 0;

        while($seen < $k){
            my $result = int(rand(2));
            $steps++;

            $seen++ if $result == 1;
        }

        my $moment = 1;
        for(my $r = 0; $r < $l; $r++){
            $moment *= ($steps - $r);
        }

        $data += $moment;
    }

    print $data/$trials;
    print "\n";

    1;
}