I understand this has been asked before but I still cannot understand the intuition behind it. A lot of the answers I've seen seem to just state that it's due to linearity of expectation and that it doesn't care about dependence, but I still cannot understand what this actually means and how it can be related to this problem.
The question is as follows:
For a group of n people find the expected number of days of the year which are birthdays of exactly k people. (Assume 365 days and that all arrangements are equally probable).
The answer quoted in the book is $${n \choose k}\frac{364^{n-k}}{365^{n-1}}$$. I understand that this is calculated using $$\mathbb{E}(X) = \sum^{365}_{i=1}\mathbb{E}(Y_i)$$ where $Y_i$ is 1 if there are k people out of the n born on it and 0 if not. However, what I can't understand at all is the intuition behind why the probability of k people on each day is equal for any day when if k have already been born on one day then we have one less day out of 365 and less ways to select the next k because we are now choosing from n-k.
Please could someone explain why we can seemingly use independent assumptions when there is a clear dependence between days? It would also be really helpful to see why this answer is the same as if we did the long calculation with dependency taken into account.
Consider there are $n$ cards with $k$ of them begin red and $n-k$ of them being black. If we selected $m$ cards at random, what is the expected number of red cards being selected?
If we are sampling with replacement, then each selection of card is independent and you should be easy to accept the indicator argument - each card has an expectation (probability of being red) $k/n$, and thus the expectation is $mk/n$
What if we are sampling without replacement? One may naturally assign a chronological order for sampling the cards, and think that when we are sampling the second card after observing the result of the first selection, the probability of red can no longer remain at $k/n$. This argument is correct, but it is actually referring to the conditional probability of the second selection being red, conditional on the result of the first selection. The marginal probability in this case is still $k/n$, as calculated by the Law of Total Probability:
$$ \begin{align} \Pr\{C_2 = R\} &= \Pr\{C_2 = R|C_1 = R\}\Pr\{C_1 = R\} + \Pr\{C_2 = R|C_1 = B\}\Pr\{C_1 = B\} \\ &= \frac {k - 1} {n - 1} \times \frac {k} {n} + \frac {k} {n - 1} \times \frac {n - k} {n} \\ &= \frac {k^2 - k + nk - k^2} {n(n-1)} \\ &= \frac {(n - 1)k} {n(n - 1)} \\ & = \frac {k} {n} \end{align}$$
In fact, if you try to calculate the marginal probability of selecting a red card, from the first selection until the last selection, you will obtain the same $k/n$ for all of them. Here we emphasize the marginal probability to distinguish between conditional probability - we do not conditional on the result of the other selections - we do not observe that / take that into account.
Consider there are $m$ people and lined up as a queue. The first people come and select the first card, and hide the card immediately and not letting other people to know the color of the selected card. Repeat the same for the other people. So what is the probability that the second people will select a red card? Does this experiment produce the same overall result with the previous setting?
If this is still hard to accept, lets not assign a chronological order for the selections - say you randomly shuffle the cards, and place the cards in a straight line with a random order such that each permutation is equally-likely. What is the probability that the card placed in the $i$-th position is red? Is the (selection) order matters here?
And from the example calculation above, we see that $\Pr\{C_2 = R|C_1 = R\} = (k - 1)/(n - 1) < k/n = \Pr\{C_2 = R\}$ and $\Pr\{C_2 = R|C_1 = B\} = k/(n - 1) > k/n = \Pr\{C_2 = R\}$ as expected. So conditional on the result of other selections will change the probability, can be smaller or larger, depends on the result, but by the law of total probability, each scenario will be averaged out when multiplying their path probability, resulting the same marginal probability (consider a probability tree diagram and placing the selections as different layers). Hope this helps.