Problem: A box contains a yellow ball, an orange ball, a green ball, and a blue ball. Billy randomly selects 4 balls from the box (with replacement). What is the expected value for the number of distinct colored balls Billy will select?
This is the answer given in the site where this question was originally asked
I can't make sense of the last step. Am I missing something obvious? Why does summation of the expectancy of every individual colored ball give us the expectancy of distinct colored balls that are picked?
Can someone explain the connection?
The linearity of expectation tells you that the mean of the sum of two random variables (even dependent) is the sum of their respective mean.
To understand, take a look at the following example :
You have two 6-dices. Each side has a 1/6 chance to appear on a throw. If you consider the mean value of 1 dice, you intuitively understand that it will be 3. That is, as the distribution is uniform, the mean stands in the "middle".
Now if you throw both dices and sum the results, the distribution of the random variable X is not uniform on [1+1, 6+6] = [2, 12]. On average, you'll get 6 because it's the value that has the most chance of happening (there are more pairs of numbers that add to 6 than to others, and it is symmetric). So the mean is 6, which is 3 + 3. So you see that the mean of the sum was the sum of the means.
What happens here is the same thing. Each ball has a RV that indicates if it has been seen already. Each RV has their own expectation on $n$ trials. Summing all RV will give you a RV whose expectancy is the sum of the respective expectancies. The sum of the 4 RV that tell you whether the balls has been seen will be the RV that tell you all of them have been seen.