We have $20$ students and $15$ lessons. In every lesson, one student is randomly picked and asked a question. Find expected value and variance.

111 Views Asked by At

There are $20$ students and $15$ lessons. In each lesson, one student is randomly picked and asked a question by the teacher. Find expected value of amount of students asked a question during the 15 lessons and find its variance.

If $X$ is random variable representing number of students asked, then I guess to find $\mathbf{E}X$ we have to write something like $X=X_1+X_2+\dots$, but I cannot find some smart way to define those $X_i$, so a hint would be greatly appreciated.

3

There are 3 best solutions below

14
On BEST ANSWER

Associate to any student $i\in\{1, ..., 20\}$ a random variable $X_i$ which counts the number of times the student $i$ has been asked a question during the $15$ lessons. The $X_i's$ each follow a binomial law of parameters $n=15$, $p=1/20$.

Then the random variable counting the number of different students that were asked at least one question is $X=1_{X_i\ge 1} + ... + 1_{X_{20}\ge 1}$

i.e., $\mathbb{E}(X)=P(X_1\ge 1) + ... + P(X_{20}\ge 1)$ $= 20(1-(19/20)^{15})$

  • EDIT 1: To compute the variance, we can no longer use the above decomposition for the $X_i$'s are not pairwise independent (we have $\sum_i X_i=15$).

Let's now compute the probabilities $P(X=k)$, for $k\in\{1, ..., 15\}$. We have $\left(20\atop{k}\right)$ choices of $k$ students among the class. For such a choice, let's count the number of possibilities that these $k$ students are asked at least once a question (and only these students). This amounts to count the number of surjection from $\{1, ..., 15\}$ to a set of $k$ elements, which is (after searching on Google) $S_{15,k}=\sum_{i=0}^{k} (-1)^{k-i}\left(k\atop{i}\right)i^{15}$.

As we have $(20)^{15}$ possible choices in total, this gives $$P(X=k)=\frac{1}{20^{15}}\left(20\atop{k}\right)\sum_{i=0}^{k} (-1)^{k-i}\left(k\atop{i}\right)i^{15}$$

We can check that it works trivially for $k=1$, and it's in fact not such a surprising formula. To count the number of surjections, we have to count all the functions (i.e., $k^{15}$) and remove the functions that take values in a stricly smaller subset (for all $1\le i<k$, we have $\left(k\atop{i}\right)$ such subsets), which amounts to subtract $\left(k\atop{i}\right)i^{15}$ functions, but we have to take care of the sub-subsets that could be subtracted twice, hence the power $(-1)^{k-i}$ ...

Given this formula, we can now compute the variance (to be continued ...)

  • EDIT 2: Let's try to prove, using the explicit formula for $P(X=k)$, that $\mathbf{E}(X)=20(1-(\frac{19}{20}^{15})$. We have $$\sum_{k=1}^{15}kP(X=k)= \frac{1}{20^{15}}\sum_{k=1}^{15}k\left(20\atop{k}\right)\sum_{i=0}^{k} (-1)^{k-i}\left(k\atop{i}\right)i^{15}$$ $$=\sum_{i=1}^{15}(\frac{i}{20})^{15}\sum_{k=i}^{15}k\left(20\atop{k}\right) (-1)^{k-i}\left(k\atop{i}\right) $$ I believe that there must be some way to prove that this sum is equal to $20(1-(\frac{19}{20})^{15})$, which is just $$\sum_{k=0}^{14}(\frac{19}{20})^k=\sum_{k=0}^{14}(1-\frac{1}{20})^k$$ $$=\sum_{k=1}^{15}\sum_{i=1}^k (-1)^{k-i}(\frac{1}{20})^{k-i}\left(k-1\atop{i-1}\right)$$ $$=\sum_{k=1}^{15}\sum_{i=1}^k (\frac{1}{20})^{k-i}\left(k\atop{i}\right)\frac{i}{k}$$

Wolfram shows the result is the same ($=\frac{17586872970125200000}{1638400000000000000}$), and I believe that proving the formal equality will help us to compute the variance in an elegant way.

For the value of the variance, once again using Wolfram and the exact value for $P(X=k)$, we get $Var(X)=\frac{4426364363247399877754480594066706599}{20^{28}}\approx 1,64895$. But I still hope for something less horrible !

4
On

Concerning how to define $X_i$, I can see two "bad" ways to do it:

Bad way 1

\begin{cases} X_i=1 & \text{if at least $i$ different students are asked a question}\\ X_i=0 & \text{otherwise} \end{cases} then you'd have to compute the expected value of a sum of non independant random variables, and I don't recommend it.

Bad way 2

\begin{cases} X_i=1 & \text{if the $i$-th student was asked a question}\\ X_i=0 & \text{if the $i$-th student was not asked a question} \end{cases} and once again, you end up with non independant random variables.

Right now I don't see other ways, but I'm not convinced there's a good way to do this with $X=\sum X_i$. Instead consider the following.

Another way (detailed because it turns out I'm bad at this)

If we can compute the probabilities $P(X=k)$, then $\mathbb EX=\sum_{k=1}^{15} kP(X=k)$. To get $P(X=k)$, go see the edit section of Yoël's answer.

EDIT:

Corrected the "alternative way" computation after SekstusEmpiryk's remark
New version after Yoël pointed out another error
Removed old and false computation of $P(X=k)$

1
On

The trick to a lot of problems about discrete random variables is to write them as a sum of indicator random variables and exploit symmetries. This is an example of such a problem. Since you're interested in second moments, we just need to understand what happens to any two students at a time.

There's nothing special about the numbers 20 and 15 here. Let's say there are $s$ students and $\ell$ lessons. Later on we'll plug in $s = 20, \ell = 15$.

Let $X_i$ be the number of times that student $i$ is asked a question. Let $Y_i = 1_{X_i \ge 1}$ be 1 if student $i$ is asked a question at least once, and 0 otherwise. Let $Y = Y_1 + \cdots + Y_s$ be the total number of students asked a question; we're trying to find $E(Y)$.

Now $E(Y) = E(Y_1) + \cdots + E(Y_s)$ since expectation is linear, and then $E(Y) = s E(Y_1)$ by symmetry - all the students are the same. $E(Y_1)$ is the probability that student 1 is asked at least one question; this is $1-(1-1/s)^\ell$. Therefore $E(Y) = s (1-(1-1/s)^\ell)$. With the particular numbers you asked about, this is $20 \times (1 - (19/20)^{15}) \approx 10.73$.

To find the variance is a bit trickier. Of course $Var(Y) = E(Y^2) - E(Y)^2$. We know $E(Y)$ and thus $E(Y)^2$. Now the thing to do here is to write $Y = Y_1 + \cdots + Y_s$, and thus we get

$$E(Y^2) = E( (Y_1 + \cdots + Y_s)^2 ) = E(Y_1^2 + \cdots + Y_s^2) + E(Y_1 Y_2 + Y_1 Y_3 + \cdots + Y_{s-1} Y_s)$$

where the first term has the squares of each of $Y_1$ through $Y_s$, and the second term has all the terms of the form $Y_j Y_k$ where $j \not = k$. Now we can rewrite this as

$$ s E(Y_1^2) + s(s-1) E(Y_1 Y_2) $$

by symmetry. Every student is the same so $E(Y_j^2)$ doesn't depend on $j$, and every pair of students is the same so $E(Y_j Y_k)$ doesn't depend on $j, k$ as long as $j$ and $k$ aren't the same.

Let's turn to $E(Y_1^2)$. Since $Y_1$ is an indicator random variable it's always 0 or 1, so $Y_1 = Y_1^2$; thus $E(Y_1^2) = E(Y_1) = (1-(1-1/s)^\ell).$

Now, what is $E(Y_1 Y_2)$? $Y_1 Y_2$ is 1 if both students 1 and 2 get asked a question, and zero otherwise. This is the tricky part. We need the principle of inclusion-exclusion:

$$P(Y_1 = 1 \hbox{ or } Y_2 = 1) = P(Y_1 = 1) + P(Y_2 = 1) - P(Y_1 = 1 \hbox{ and } Y_2 = 1)$$

which we can rearrange to get

$$P(Y_1 = 1 \hbox{ and } Y_2 = 1) = P(Y_1 = 1) + P(Y_2 = 1) - P(Y_1 = 1 \hbox{ or } Y_2 = 1)$$

Of course $P(Y_1 = 1) = E(Y_1) = 1-(1-1/s)^\ell$ and $P(Y_2 = 1) = E(Y_2) = 1-(1-1/s)^\ell$. And we can see that $P(Y_1 = 1 \hbox{ or } Y_2 = 1) = 1-(1-2/s)^\ell$, since $(1-2/s)^\ell$ is the probability that neither $1$ nor $2$ ever gets asked a question. Thus we have

$$E(Y_1 Y_2) = P(Y_1 = 1 \hbox{ and } Y_2 = 1) = 2 (1-(1-1/s)^\ell) - (1-(1-2/s)^\ell)$$

and with the particular numbers you asked about, this is $2 (1 - (19/20)^{15}) - (1 - (18/20)%{15}) \approx 0.2793$.

So we can finally put everything together to get

$$E(Y^2) = s (1-(1-1/s)^\ell) + s(s-1) [2 (1-(1-1/s)^\ell) - (1-(1-2/s)^\ell)] $$

and thus

$$Var(Y) = s (1-(1-1/s)^\ell) + s(s-1) [2 (1-(1-1/s)^\ell) - (1-(1-2/s)^\ell)] - (s (1-(1-1/s)^\ell))^2 $$

(which can probably be simplified a bit). If you plug in $s = 20, \ell = 15$ you get about $1.64895$ ,the same numerical answer as Yoel.