Mean of means and standard deviation

4.2k Views Asked by At

I have a set of data with their means $\mu_1,\mu_2,\ldots\mu_n$ and standard deviations $\sigma_1,\sigma_2,\ldots,\sigma_n$. These values refer to repetitions of the same experiment.
How do I calculate the "mean mean" (average?) and mean standard deviation summarizing all different experiments $\mu_{tot},\sigma_{tot}$?

Basically I have $X_1\sim N(\mu_1,\sigma^2_1), X_2\sim N(\mu_2,\sigma^2_2),\ldots,X_n\sim N(\mu_n,\sigma^2_n)$
and $Z=\frac{X_1+X_2+\ldots+X_n}{n}$. The question is: $Z\sim N(?,?)$

2

There are 2 best solutions below

11
On BEST ANSWER

If you have a physical data set, you can compute it directly. Both the mean, and the standard deviation.

If you have the sizes of populations, say $m_1,\dots,m_n$, then the common mean is trivial to count: $$\mu=\frac{m_1\mu_1+\dots+m_n\mu_n}{m_1+\dots+m_n}$$ as the numerator is the total of all populations.

About the common variance. This is a mean of squares minus a square of mean. Then you should recreate the sums of squares. For example $\sigma_1^2+\mu_1^2$ is the mean of squares of the 1st population. Then you have also the sum of squares. Next, in this simple manner you have the joint sum of squares. Then their averge is easy to find by division by $m_1+\dots+m_n.$ Finally, subtract $\mu^2$ and we are done. :)

If $m_1=\dots=m_n=m$ (you write about the same experiment), then $$\mu=\frac{\mu_1+\dots+\mu_n}{n}.$$ The sum of squares in $i$-th experiment is $m(\sigma_i^2+\mu_i^2)$. Hence the total variance is $$\sigma^2=\frac{m(\sigma_1^2+\mu_1^2+\dots+\sigma_n^2+\mu_n^2)}{nm}-\mu^2=\frac{(\sigma_1^2+\mu_1^2+\dots+\sigma_n^2+\mu_n^2)}{n}-\mu^2.$$

About the (edited) last fragment of your question, the mean of $Z$ is $$\mu=\frac{\mu_1+\dots+\mu_n}{n},$$ while the standard deviation is $$\sigma=\sqrt{\frac{\sigma_1^2+\dots+\sigma_n^2}{n}}$$ provided that $X_1,\dots,X_n$ are independent.

2
On

Lemma:

$$\int_{-\infty}^\infty e^{-(ax^2-2bx+c)/2}dx=\frac{\sqrt{2\pi}e^{(b^2-ac)/2a}}{\sqrt a}.$$

This is established by completing the square, and the exponent becomes

$$-a\left(x-\frac ba\right)^2+\frac{b^2}a-c.$$

The first term generates the a Gaussian curve, the integral of which is known, and the second a constant factor.


Now we prove that the sum of two independent Gaussians is a Gaussian. Let $z=x+y$, or $y=z-x$, and WLOG, $y$ is centered/reduced, i.e. $\mu_y=0,\sigma_y=1$.

The $\text{pdf}$ of $z$ is obtained as the integral

$$\text{pdf}_{x+y}(z)=\frac1{\sqrt{2\pi}\sigma_x\sqrt{2\pi}}\int_{-\infty}^\infty e^{-(x-\mu_x)^2/2\sigma_x^2}e^{-(z-x)^2/2}dx.$$

The quadratic coefficients of the exponent are

  • $a=\dfrac1{\sigma_x^2}+1$,

  • $b=\dfrac{\mu_x}{\sigma_x^2}+z,$

  • $c=\dfrac{\mu_x^2}{\sigma_x^2}+z^2$.

Now

$$b^2-ac=-\frac{z^2-2\mu_xz+\mu_x^2}{\sigma_x^2},$$

$$\frac{b^2-ac}a=-\frac{(z-\mu_x)^2}{\sigma_x^2+1},$$

and by identification we have a Gaussian law of mean $\mu_z=\mu_x+0$ and variance $\sigma_z^2=\sigma_x^2+1$.

To revert to the general case $(\mu_y,\sigma_y)$, we can scale the variables by $\sigma_y$ and translate by $\mu_y$, which gives

$$\sigma_z^2=\sigma_x^2+\sigma^2_y$$

and

$$\mu_z=\mu_x+\mu_y.$$


The generalization to three variables and more is immediate, as we have established the additivity of the mean and the variance, while the distribution remains Gaussian.