Calculating statistic for multiple runs

Question

Calculating statistic for multiple runs

966 Views Asked by Bumbble Comm At 02 Apr 2026 - 2:58

I have s imple, general question regarding calculating statistic for N runs of the same experiment. Suppose I would like to calculate mean of values returned by some Test. Each run of the test generates $ \langle x_1 ... x_n \rangle$ , possibly of different length. Let's say the statistic is mean. Which approach would be better and why:

Sum all values from M runs, and then divide by number of values
for each run calculate average, and then average across all averages

I believe one of the above might beunder/overestimating the mean slightly and I don't know which. Thanks for your answers.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 12 Jun 2012 - 7:55

Let the random variable $W_1$ denote the result obtained from the first procedure, and let $W_2$ denote result obtained from the second procedure.

If $\mu$ is the true mean, then by the linearity of expectation we have $E(W_1)=E(W_2)=\mu$. Both $W_1$ and $W_2$ are unbiased estimators of the mean.

The difference is that the variance of $W_2$ may (and in general will) be greater than the variance of $W_1$. (It cannot be less.)

So though both are unbiased, $W_1$ is a better estimator than $W_2$.

The intuition: To see intuitively that $W_2$ has in general larger variance, imagine that we have exactly two runs, one with $1000$ performances of the experiment, and a second short one of $2$ performance of the experiment.

Look at $W_1$. It pools the results of the $1002$ experiments. The resulting sample mean is probably quite close to $\mu$.

Now look at $W_2$. The average we get by averaging the first $1000$ performances is probably quite close to $\mu$. But the average of the $2$ performances is likely to be some distance from $\mu$. So when this average is averaged in with the average of the first $1000$, it is likely to "contaminate" that average, making it more likely that $W_2$ is a significant distance from the true mean. In essence, we are giving equal weight to the reliable first run average and the quite unreliable second run average.

**Bumbble Comm** · Accepted Answer

$\def\E{{\rm E}}\def\V{{\rm Var}}$Say you have $M$ runs of lengths $n_1,\dots,n_M$. Denote the $j$th value in the $i$th run by $X^i_j$, and let the $X^i_j$ be independent and identically distributed, with mean $\mu$ and variance $\sigma^2$.

In your first approach you calculate

$$\mu_1 = \frac{1}{n_1+\cdots n_M} \sum_{i=1}^M \sum_{j=1}^{n_i} X^i_j$$

and in your second approach you calculate

$$\mu_2 = \frac{1}{M} \sum_{i=1}^M \left( \frac{1}{n_i} \sum_{j=1}^{n_i} X^i_j\right)$$

You can compute their expectations:

$$\E(\mu_1) = \frac{1}{n_1+\cdots n_M} \sum_{i=1}^M \sum_{j=1}^{n_i} \mu = \frac{(n_1+\cdots n_M)\mu}{n_1+\cdots n_M} = \mu$$

vs

$$\E(\mu_2) = \frac{1}{M} \sum_{i=1}^M \left( \frac{1}{n_i} \sum_{j=1}^{n_i}\mu \right) = \frac{1}{M} ( M\mu ) = \mu$$

so the estimator is unbiased in both cases. However, if you calculate the variances you will find that

$$\V(\mu_1) = \frac{\sigma^2}{n_1+\cdots n_M}$$

and

$$\V(\mu_2) = \frac{1}{M} \left( \sum_{i=1}^M \frac{1}{n_i} \right) \sigma^2$$

With a little effort, you can show that

$$\V(\mu_1)\leq \V(\mu_2)$$

where the inequality is strict except when $n_1=n_2=\cdots=n_M$, i.e. when all of the runs produce the same amount of output. If you need to be convinced of this, work through the details in the case $M=2$, $n_1=1$ and $n_2=N >1$.

Therefore it is better to take your first approach, of summing up the output of all runs and dividing by the total length of the output. The expectation is the same in either case, but the variance is lower with the first approach.

Calculating statistic for multiple runs

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in AVERAGE

Trending Questions

Popular # Hahtags

Popular Questions