Will decreasing the variance of a subset, the global variance also decrease and vice versa?

134 Views Asked by At

While implementing one of our propose algorithm we are assuming that, by decreasing the variance of a subset, the global variance will also decrease considering the global mean value remains same and vice versa.

For example, considering the below list

{1, 1, 1, 1, 6, 5, 6, 5}; $\sigma^2_{Global_{Before}} = 5.1875$ and $\mu_{Global_{Before}} = 3.25$

Now if we consider the subset {1,6,5}; $\sigma^2_{Local_{Before}} = 4.67$ and $\mu_{Local_{Before}} = 4$

and rearrange the items as below

{4,4,4}; $\sigma^2_{Local_{After}} = 0$ and $\mu_{Local_{After}} = 4$

Thus the global list has been updated as below

{1, 1, 1, 4, 4, 4, 6, 5}; $\sigma^2_{Global_{After}} = 3.4375$ and $\mu_{Global_{After}} = 3.25$

Is it something we need to proof by theorem or it has been already proven? Where can I get the proof or if anyone can give me some guidelines.

2

There are 2 best solutions below

0
On BEST ANSWER

Given any finite set of variables $A = \{ a_1, a_2, \ldots a_n \}$, let $|A|$, $\mu_A$, $\sigma_A$ be the number, mean and standard derviations of variables in $A$. We have

$$\begin{align} \sum\limits_{a \in A} a &= |A| \mu_A\\ \sum\limits_{a \in A} a^2 &= |A| \left(\sigma_A^2 + \mu_A^2\right) \end{align} $$ This means if $A$ and $B$ are two disjoint finite set of variables, we have

$$\begin{align} |A\cup B|\mu_{A\cup B} &= |A| \mu_A + |B| \mu_B\\ |A\cup B|\left(\sigma_{A\cup B}^2 + \mu_{A\cup B}^2\right) &= |A| \left(\sigma_A^2 + \mu_A^2\right) + |B| \left(\sigma_B^2 + \mu_B^2\right) \end{align} $$ This leads to $$\sigma_{A\cup B}^2 = \frac{|A|\sigma_A^2 + |B|\sigma_B^2}{|A\cup B|} + \frac{|A||B|}{|A\cup B|^2}(\mu_A - \mu_B)^2 $$ As a consequence, if one reduce the variance $\sigma_A^2$ by keeping $\mu_A, \mu_B$ and $\sigma_B^2$ all fixed, then the variance of the union, $\sigma_{A\cup B}^2$, also decreases.

3
On

In general this will not work. Consider the list

1, 1, 1, 1, 6, 5, 6, 5

Now reduce the variance of the right hand four terms by changing the list to

1, 1, 1, 1, 6, 6, 6, 6 

The overall variance just got worse.

You might argue that if you'd changed it to "5,5,5,5" it'd have gotten better, and that's correct...but just reducing variance in a subset won't help, as this example shows.

In general, if you replace each element of a subset by its mean, then you should indeed reduce total variance, because total mean $m$ remains the same, and the sum of square distances to $m$ for things outside the subset remains the same, but the sum of squared distance to $m$ for things INSIDE the subset gets smaller.