How to merge two Gaussians

24.5k Views Asked by At

I have two multivariate Gaussians each defined by mean vectors and Covariance matrices (diagonal matrices). I want to merge them to have a single Gaussian i.e. I assume there is only one Gaussian but I separated observations randomly into two groups to get two different Gaussians which are not too different than each other.

Since I know the number of observations in each of two Gaussians, combined mean estimation is straight forward : $\frac{n_1\mu_1 + n_2\mu_2}{n_1+n_2}$

But, what about the Covariance matrix?

Thanks

EDIT:

The question was confusing in the original post, especially the "merging Gaussians" part. Maybe the following paragraph would be a better choice.

I have two sets of observations drawn from two multivariate Gaussians each defined by mean vectors and Covariance matrices (diagonal matrices). I want to merge the observations to have a single sample, and I assume to have another Gaussian (i.e. I assume initially there was only a single Gaussian, and observations were separated into two groups to get two different Gaussians).

3

There are 3 best solutions below

1
On BEST ANSWER

Ok I solved it :)

Since covariance matrix is diagonal we can assume having multiple univariates. And then variance combination is as

$$\hat{\mu} = \frac{n_1\mu_1 + n_2\mu_2}{n_1+n_2}$$

$$\hat{\sigma}^2 = \frac{(\sigma_1^2 + \mu_1^2)n_1 + (\sigma_2^2 + \mu_2^2)n_2}{ (n_1+n_2)} - \hat{\mu}^2$$

Here, I used $\sigma^2 = E[x^2] - E[x]^2$

thanks again

8
On

I might be wrong or misinterpreted the question, but trying to reproduce the result in the accepted and upvoted answer, I get a different result:

Let $x \sim N(\mu, \sigma^2)$. From the definition of the variance follows $$ \sigma^2 = E[x^2] - E[x]^2 = E[x^2] - \mu^2 $$$$ or \quad E[x^2] = \sigma^2 + \mu^2 $$

Now let $x$ be the random variable defined as the weighted average $x = \frac{n_1x_1 + n_2x_2}{n_1 + n_2}$, where $x_1 \sim N(\mu_1, \sigma_1^2)$ and $x_2 \sim N(\mu_2, \sigma_2^2)$ are independent.

We have easily $$ E[x] = \frac{n_1\mu_1 + n_2\mu_2}{n_1 + n_2} := \mu $$

By the above formula for $E[x_1^2]$ and $E[x_2^2]$, and since $x_1$ and $x_2$ are independent ($E[x_1x_2] = E[x_1]E[x_2]$) we have

\begin{align} E[x^2] &= E[(\frac{n_1x_1 + n_2x_2}{n_1 + n_2})^2] \\ &= \frac{1}{(n_1 + n_2)^2} E[n_1^2 x_1^2 + n_2^2 x_2^2 + 2n_1n_2x_1x_2] \\ &= \frac{1}{(n_1 + n_2)^2} (n_1^2 E[x_1^2] + n_2^2 E[x_2^2] + 2n_1n_2E[x_1]E[x_2]) \\ &= \frac{(\sigma_1^2 + \mu_1^2)n_1^2 + (\sigma_2^2 + \mu_2^2)n_2^2 + 2n_1n_2\mu_1\mu_2}{(n_1+n_2)^2} \\ &= \frac{\sigma_1^2 n_1^2 + \sigma_2^2 n_2^2 + (n_1\mu_1 + n_2\mu_2)^2}{(n_1+n_2)^2} \end{align}

We can use this to calculate the pooled variance: \begin{align} \sigma^2 &= E[x^2] - E[x]^2 \\ &= \frac{\sigma_1^2 n_1^2 + \sigma_2^2 n_2^2 + (n_1\mu_1 + n_2\mu_2)^2}{(n_1+n_2)^2} - \mu^2 \\ &= \frac{\sigma_1^2 n_1^2 + \sigma_2^2 n_2^2}{(n_1+n_2)^2} + \mu^2 - \mu^2 \\ &= \frac{n_1^2}{(n_1+n_2)^2}\sigma_1^2 + \frac{n_2^2}{(n_1+n_2)^2}\sigma_2^2 \end{align}


In the multivariate case, if the covariance matrix is diagonal, we can apply this formula on each dimension separately. Otherwise, let $X \sim N(\mu, \Sigma)$, where $\mu \in \mathbb{R}^n$ and $\Sigma \in \mathbb{R}^{n \times n}$.

Again start with the general definition of the covariance matrix that gives

$$ \Sigma = E[(X-\mu)(X-\mu)^T] = E[XX^T] - \mu \mu^T $$$$ or \quad E[XX^T] = \Sigma + \mu\mu^T $$

Let $X$ be the random variable defined as the weighted average $X = \frac{n_1X_1 + n_2X_2}{n_1 + n_2}$, where $X_1 \sim N(\mu_1, \Sigma_1)$ and $X_2 \sim N(\mu_2, \Sigma_2)$ are independent.

To simplify, let $D = \frac{1}{(n_1 + n_2)^2}$ . We have now

\begin{align*} E[XX^T] &= D \cdot E[(n_1X_1 + n_2X_2)(n_1X_1 + n_2X_2)^T] \\ &= D \cdot E[n_1^2 X_1X_1^T + n_2^2 X_2X_2^T + 2n_1n_2X_1X_2^T] \\ &= D \cdot (n_1^2 E[X_1X_1^T] + n_2^2 E[X_2X_2^T] + 2n_1n_2E[X_1]E[X_2^T]) \\ &= D \cdot (n_1^2 (\Sigma_1 + \mu_1\mu_1^T) + n_2^2 (\Sigma_2 + \mu_2\mu_2^T) + 2n_1n_2\mu_1\mu_2^T) \\ &= D \cdot (n_1^2 \Sigma_1 + n_2^2 \Sigma_2 + (n_1\mu_1 + n_2\mu_2)(n_1\mu_1 + n_2\mu_2)^T) \end{align*}

Finally the pooled covariance matrix is: \begin{align*} \Sigma &= E[XX^T] - E[X]E[X]^T \\ &= \frac{n_1^2 \Sigma_1 + n_2^2 \Sigma_2 + (n_1\mu_1 + n_2\mu_2)(n_1\mu_1 + n_2\mu_2)^T}{(n_1 + n_2)^2} - \mu\mu^T \\ &= \frac{n_1^2 \Sigma_1 + n_2^2 \Sigma_2}{(n_1 + n_2)^2} + \mu\mu^T - \mu\mu^T \\ &= \frac{n_1^2}{(n_1 + n_2)^2}\Sigma_1 + \frac{n_2^2}{(n_1 + n_2)^2}\Sigma_2 \end{align*}

0
On

As the answer of JulienD is good. But I would like to rewrite the final formulas in a better way and extend them to many random variables, and want to clarify that the approved answer is likely not correct.

First, I rewrite the answer of JulienD with few additional steps to give it a better form as follows:

  • Single variate: \begin{align} \mu &= E[x] = \frac{n_1\mu_1 + n_2\mu_2}{n_1 + n_2} \end{align}

\begin{align} \sigma^2 & = E[x^2] - E[x]^2 \\ & = \frac{\sigma_1^2 n_1^2 + \sigma_2^2 n_2^2 + (n_1\mu_1 + n_2\mu_2)^2}{(n_1+n_2)^2} - \mu^2 \\ & = \frac{\sigma_1^2 n_1^2 + \sigma_2^2 n_2^2}{(n_1+n_2)^2} + \frac{(n_1\mu_1 + n_2\mu_2)^2}{(n_1+n_2)^2} - \mu^2 \\ & = \frac{\sigma_1^2 n_1^2 + \sigma_2^2 n_2^2}{(n_1+n_2)^2} + \mu^2 - \mu^2 \\ & = \frac{n_1^2}{(n_1+n_2)^2}\sigma_1^2 + \frac{n_2^2}{(n_1+n_2)^2}\sigma_2^2 \end{align}

  • Multivariate: \begin{align} \Sigma &= E[XX^T] - E[X]E[X]^T \\ & = \frac{n_1^2 \Sigma_1 + n_2^2 \Sigma_2 + (n_1\mu_1 + n_2\mu_2)(n_1\mu_1 + n_2\mu_2)^T}{(n_1 + n_2)^2} - \mu\mu^T \\ &= \frac{n_1^2 \Sigma_1 + n_2^2 \Sigma_2}{(n_1 + n_2)^2} + \frac{(n_1\mu_1 + n_2\mu_2)(n_1\mu_1 + n_2\mu_2)^T}{(n_1 + n_2)^2} - \mu\mu^T \\ & = \frac{n_1^2 \Sigma_1 + n_2^2 \Sigma_2}{(n_1 + n_2)^2} + \mu\mu^T - \mu\mu^T \\ &= \frac{n_1^2}{(n_1 + n_2)^2} \Sigma_1 + \frac{n_2^2}{(n_1 + n_2)^2} \Sigma_2 \end{align}

Another approach to get a formula of variance or covariance matrix (for multivariate) is that we use the theorem of variance (which can find in many textbooks of statistics, e.g., Larry Wasserman, All of Statistics, theorem 3.20, page 52, or prove via definition as a similar way as JulienD did): $$ \mathbb{V}(X+Y) = \mathbb{V}(X) + \mathbb{V}(Y) + 2 \textbf{Cov}(X,Y) $$ More generally, for random variables $X_1, X_2,..., X_N$: $$ \mathbb{V}(\sum_{i=1}^{N}a_i X_i) = \sum_{i=1}^{N}a_i^2\mathbb{V}(X_i) + 2 \sum_{i=1}^{N-1}\sum_{j=i+1}^{N} a_i a_j \textbf{Cov}(X_i,X_j) $$

Consider $X_i \sim \mathcal{N}(\mu_i, \Sigma_i), i=1 \cdots N,$; and they are independent of each other. Hence, the covariances are zero, i.e., $\textbf{Cov}(X_i,X_j)=0.$ A newly aggregated these Gaussian distributions is defined as a weighted sum: $$X=\sum_{i=1}^{N} a_i X_i=\sum_{i=1}^{N} \frac{n_i}{\sum_{l=1}^{N}n_l} X_i,$$ where $a_i=\frac{n_i}{\sum_{l=1}^{N}n_l}$ and $\sum_{i=1}^N a_i = 1$.

So, we get the mean and covariance of the aggregated distribution as follows: $$ \mu = \sum_{i=1}^{N} a_i \mu_i $$ \begin{align} \Sigma &=\mathbb{V}(X)= \mathbb{V}(\sum_{i=1}^{N} a_i X_i)\\ &=\sum_{i=1}^{N}a_i^2\mathbb{V}(X_i) + 2 \sum_{i=1}^{N-1}\sum_{j=i+1}^{N} a_i a_j \textbf{Cov}(X_i,X_j) \\ & = \sum_{i=1}^{N}a_i^2 \Sigma_i + 0 \\ & = \sum_{i=1}^{N}a_i^2 \Sigma_i \end{align}

Now, look at the answer of ahmethungari, it seems the answer is not correct.