Combining Covariances of Two Sets

87 Views Asked by At

According to Wikipedia, the formula for combining the covariances of two sets is: $$C_X=C_A+C_B+(\overline{x}_A-\overline{x}_B)(\overline{y}_A-\overline{y}_B) \cdot\frac{n_An_B}{n_X} $$ where:

  • $A$ and $B$ are the first and second sets.
  • $C$ is the Covariance.
  • $n$ is the number of samples.
  • $n_X = n_A + n_B$.
  • $x$ and $y$ are the features.

I implemented this formula by splitting one dataset into two equal sets, for testing purposes, yet the result is quite different from the original dataset covariance.

Now, let $M_{AB}$ be this part of the above formula:

$$(\overline{x}_A-\overline{x}_B)(\overline{y}_A-\overline{y}_B) \cdot\frac{n_An_B}{n_X}$$

Looking at this implementation, the author basically applied the following formula:

$$ C_X = \frac{(C_A \color{red}{\cdot n_A}) + (C_B \color{red}{\cdot n_B}) + M_{AB}}{\color{red}{n_X}} $$

which gives the correct combined covariance!.


I could not understand how the latter is derived or achieved algebraically? Or if it's even similar to the former formula? because there are extra $\color{red}{n_A}$ and $\color{red}{n_B}$ that are added to the second formula!

Your help is appreciated.