Consider a bag of coins containing some fixed, large number of unfair coins. Each coin has a different probability $p$ for heads. The estimation task is to estimate the mean probability of heads $$\mathbb{E}[p_{heads}]$$ with respect to the entire bag of coins. This can be done by randomly drawing individual coins from the bag with replacement and tossing those coins a couple of times to estimate their individual probabilities of head.
More specifically, let' say we draw $s$ coins from the bag and toss each of these $m$ times to obtain $X_{i,j}\in \{0,1\}$ where $i\in \{1,\dots,s\}$ and $j\in \{1,\dots ,m\}$. The unbiased estimator for the mean probability of heads of the bag is then $$\hat{p}_{heads} =\frac{1}{s}\frac{1}{m}\sum_{i=1}^s \sum_{j=1}^m X_{i,j}$$.
My question is: What is the variance of this estimator as a function of $s$ and $m$?
Motivation: In particular, I am trying to understand if the variance actually gets at all better when increasing $m$ and keeping $s$ fixed, i.e. is there any merit to estimating individual coins more precisely. This question is motivated by a setting where drawing new coins from the bag is a lot more costly than just tossing a coin that has already been drawn a lot of times. One might think that one could get away with a trade off: Either draw a lot of coins and estimate them very roughly or draw few coins and estimate those very accurately to form an average.
We can write $\hat{p} = \frac{1}{s}\frac{1}{m} \sum_{i = 1}^s X_i$, where $X_i | p_i \sim B(m, p_i)$ is a binomial random variable counting the number of heads thrown on the $i$-th coin, whose probability of success is itself a random variable (that we don't know much about). Then each of the $X_i$ is independent of each other (since the coin draws are being made with replacement), so we know from standard variance properties that:
$$Var(\hat{p}) = \frac{1}{s^2} \frac{1}{m^2} \sum_{i = 1}^s Var(X_i)$$
Then, since each $X_i$ has two separate random processes involved in it, we can invoke the law of total variance to better understand it, noting that if $p_i$ were fixed, then $E(X_i) = m p_i (1 - p_i)$ and $Var(X_i) = m p_i$ per standard properties of the binomial distribution:
$$\begin{eqnarray}Var(X_i) & = & E_{p_i}[Var(X_i | p_i)] + Var_{p_i}[E(X_i | p_i)] \\ & = & E[m p_i (1 - p_i)] + Var(m p_i) \\ & = & m (E(p_i) - E(p_i^2)) + m^2 Var(p_i) \\ & = & m E(p_i) - m(Var(p_i) + E(p_i)^2) + m^2 Var(p_i) \\ & = & m E(p_i) (1 - E(p_i)) + m(m - 1) Var(p_i) \\ & = & m E(p) (1 - E(p)) + m(m - 1) Var(p) \end{eqnarray}$$
Since each $p_i$ is just an i.i.d. draw from the possible coins in the bag, for which we can say that the probability of heads is distributed per the random variable $p$.
Putting this back into the original variance calculation gives us:
$$\begin{eqnarray}Var(\hat{p}) & = & \frac{1}{s^2} \frac{1}{m^2} \sum_{i = 1}^2 Var(X_i) \\ & = & \frac{1}{s^2} \frac{1}{m^2} s \left[m E(p) (1 - E(p)) + m(m - 1) Var(p) \right] \\ & = & \frac{1}{s m} \left[ E(p) (1 - E(p)) + (m - 1) Var(p) \right] \end{eqnarray}$$
So, to answer your question - is it better to flip more coins, or to flip the same coins more times? Well, notice that as $m$ increases, the expression will tend asymptotically to $\frac{1}{s} Var(p)$, meaning that more flips of the same coins gives you diminishing returns - you'll become very certain of how those coins will flip, but you still have a finite amount of information about the bag in general. On the other hand, increasing $s$ will make the variance tend to 0, so you will always get a more accurate result.
However, if you have some information about $E(p)$ and $Var(p)$, as well as the relative "cost" of drawing more coins versus flipping the same coins, then you can properly optimise the process. For example, if you know that the coins are probably fairly similar (i.e. $Var(p)$ is small-ish), then the overall variance is essentially $\frac{K}{s m}$ and you can try to minimise that value subject to a total fixed cost of $C = c_1 s + c_2 m$ via, say, Lagrange multipliers to find that you should perform them in inverse relation to their cost (i.e. $\frac{s}{m} = \frac{c_2}{c_1}$).