How the formula of Combined Standard deviation works?

205 Views Asked by At

A question have mean and standard deviation of two groups, it ask to find out combined mean and standard deviation. I could not understand that how formula of combined standard deviation connects with general sample/population standard deviation formula.

1

There are 1 best solutions below

0
On

Outline: For sample A, use the formula $S^2_A = \frac{1} {n_A-1} [\sum_A X_i^2 - n\bar X^2_A]$ along with $n_a$ and $\bar X_A$ to find $\sum_A S^2_A.$ Similarly, for sample B. For the combined sample C: $\sum_C X_i^2 = \sum_A X^2_i +\sum_B X^2_i.$ Finally, use $S^2_C = \frac{1} {n_C-1} [\sum_C X_i^2 - n_C\bar X^2_C],$ where $n_C = n_A + n_B$ and $\bar X_c = \frac {1} {n_C} (n_A \bar X_A + n_b \bar X_B).$

In case it helps here are two samples from R statistical software, along with all of the quantities used above:

set.seed(4618);  x.a = round(rnorm(10, 50, 3));  x.b = round(rnorm(20, 52,4))

x.a; length(x.a);  mean(x.a);  var(x.a);  sum(x.a^2)
## 55 47 57 53 49 52 52 51 51 52
## 10        # size of first sample
## 51.9      # mean of first sample
## 7.877778  # variance of first sample
## 27007     # sum of squares of first sample

9*var(x.a) + 10*mean(x.a)^2
## 27007


x.b; length(x.b);  mean(x.b);  var(x.b);  sum(x.b^2)
## 57 46 52 56 53 58 55 50 57 45 48 53 50 61 54 48 52 50 51 52
## 20
## 52.4
## 17.09474
## 55240

x.c = c(x.a, x.b)   # combine samples
x.c; length(x.c);  mean(x.c);  var(x.c);  sum(x.c^2)
 [1] 55 47 57 53 49 52 52 51 51 52 57 46 52 56 53 58 55 50 57 45
[21] 48 53 50 61 54 48 52 50 51 52
## 30
## 52.23333
## 13.7023
## 82247

Boxplots of the three samples, where variable width suggests different sample sizes.

enter image description here