Standard deviation of concatenation of two vectors whose std are known

308 Views Asked by At

Let $v_1$, $v_2$ be two vectors of lengths $m$, $n$ of real numbers.

Let $\sigma_1$, $\sigma_2$ be their standard deviations, and assume they are already calculated.

It is possible to assume we also know the means of the vectors.

Is there a $O(1)$ formula to calculate the standard deviation of the concatenation of those two vectors?

Thanks

2

There are 2 best solutions below

0
On BEST ANSWER

Let $v_1=(x_1,x_2,..x_m)$ have mean $\mu_1$ and standard deviation $\sigma_1$, and $v_2=(y_1,y_2,..y_n)$ have mean $\mu_2$ and standard deviation $\sigma_2$. We can write $$\mu_1=\frac 1m \sum_{i=1}^m x_i\\\sigma_1^2=\frac 1m\sum_{i=1}^m(x_i-\mu_1)^2$$ We rewrite the second equation as $$m\sigma_1^2=\sum_{i=1}^m(x_i^2-2x_i\mu_1+\mu_1^2)=\sum_{i=1}^mx_i^2-m\mu_1^2$$ or $$\sum_{i=1}^mx_i^2=m\mu_1^2+m\sigma_1^2$$ Similarly $$\sum_{i=1}^ny_i^2=n\mu_2^2+n\sigma_2^2$$ We can now write $$\mu=\frac{\sum_{i=1}^mx_i+\sum_{i=1}^ny_i}{m+n}=\frac{m\mu_1+n\mu_2}{m+n}$$ and for the standard deviation: $$\sigma^2=\frac1{m+n}\left(\sum_{i=1}^m(x_i^2-\mu)^2+\sum_{i=1}^n(y_i^2-\mu)^2\right)$$ Expanding the squares we get $$\sigma^2=\frac{1}{m+n}\left(\sum_{i=1}^mx_i^2-2\mu\sum_{i=1}^mx_i+m\mu^2+\sum_{i=1}^n y_i^2-2\mu\sum_{i=1}^n y_i+n\mu^2\right)\\=\frac{1}{m+n}(m\mu_1^2+m\sigma_1^2-2\mu m\mu_1+m\mu^2+n\mu_2^2+n\sigma_2^2-2\mu n\mu_2+n\mu^2)$$

0
On

I will assume the standard deviation values you're using are the "uncorrected sample deviation" or "standard deviation of the sample", defined as shown in the Uncorrected sample standard deviation section of Wikipedia's "Standard deviation" article as

$$s_{N}={\sqrt {{\frac {1}{N}}\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}}} \tag{1}\label{eq1A}$$

If you're using the corrected sample where the fraction is $\frac{1}{N-1}$ instead, you can adjust the following solution accordingly. Consider the summation part to get

$$\begin{equation}\begin{aligned} \sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2} & = \sum _{i=1}^{N}(x_{i}^2 - 2x_i\bar{x} + \bar{x}^2) \\ & = \sum _{i=1}^{N}x_{i}^2 + \sum _{i=1}^{N}(- 2x_i\bar{x}) + \sum _{i=1}^{N}\bar{x}^2 \\ & = \sum _{i=1}^{N}x_{i}^2 - 2\bar{x}\sum _{i=1}^{N}x_i + N\bar{x}^2 \\ & = \sum _{i=1}^{N}x_{i}^2 - 2\bar{x}(N\bar{x}) + N\bar{x}^2 \\ & = \sum _{i=1}^{N}x_{i}^2 - N\bar{x}^2 \end{aligned}\end{equation}\tag{2}\label{eq2A}$$

Note I used that $\bar{x} = \left(\frac{1}{N}\right)\sum _{i=1}^{N}x_{i} \implies \sum _{i=1}^{N}x_{i} = N\bar{x}$. Thus, squaring both sides of \eqref{eq1A} and using \eqref{eq2A} in it gives

$$\begin{equation}\begin{aligned} (s_{N})^2 & = \frac{1}{N}\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2} \\ N(s_{N})^2 & = \sum _{i=1}^{N}x_{i}^2 - N\bar{x}^2 \end{aligned}\end{equation}\tag{3}\label{eq3A}$$

For $v1$, let $s_x$ be its standard deviation and $x_i$ be the $m$ values. For $v2$, let $s_y$ be its standard deviation and $y_i$ be the $n$ values in $v_2$. For the concatenation of the $2$ vectors, let $s_z$ be its standard deviation and $z_i$ be the $m + n$ values. For the concatenated vector, you get in \eqref{eq3A} that

$$\begin{equation}\begin{aligned} (m + n)(s_{z})^2 & = \sum _{i=1}^{m + n}z_{i}^2 - (m + n)\bar{z}^2 \\ (m + n)(s_{z})^2 & = \sum _{i=1}^{m}x_{i}^2 + \sum _{i=1}^{n}y_{i}^2 - (m + n)\bar{z}^2 \\ (m + n)(s_{z})^2 & = (m(s_{x})^2 + m\bar{x}^2) + (n(s_{y})^2 + n\bar{y}^2) - (m + n)\bar{z}^2 \\ s_{z} & = \sqrt{\frac{(m(s_{x})^2 + m\bar{x}^2) + (n(s_{y})^2 + n\bar{y}^2) - (m + n)\bar{z}^2}{m + n}} \end{aligned}\end{equation}\tag{4}\label{eq4A}$$

Note that

$$\begin{equation}\begin{aligned} \bar{z} & = \frac{\sum_{i=1}^{m+n}z_i}{m + n} \\ & = \frac{\sum_{i=1}^{m}x_i + \sum_{i=1}^{m}y_i}{m + n} \\ & = \frac{m\bar{x} + n\bar{y}}{m + n} \end{aligned}\end{equation}\tag{5}\label{eq5A}$$

You can now plug in the known values into \eqref{eq5A} to get $\bar{z}$ and then plug that into \eqref{eq4A} to get $s_z$.