Can the variance of two subsets of an observation of a random variable be greater than variance of the original complete set of observations?

691 Views Asked by At

So if I have a finite set $ S \subset \mathbb{R} $ that can be partitioned in any form into two subsets $ S_1, S_2 $ such that $ S = S_1 \cup S_2 $ and $ S_1 \cap S_2 = \emptyset $.

Can I form $ S_1, S_2 $ such that $ Var(S_1) > Var(S) $ and $ Var(S_2) > Var(S) $ where $ Var(X) = E( (X - E(X))^2 ) $?

There are obvious cases if we only care about the variance of one set's variance, but I can't prove it (or find a case true case) for boths sets having greater variance. I every wrote a program to run through 1000000 random cases and did not find any true cases.

Revised question per discussion below accepted answer:

Per aiden's request, I believe a better formulation of his question is as follows:

If I have a set $S$ of observations of a real-valued random variable, with variance $\sigma^2_S$, is there a partition of the observations into two disjoint subsets, $S_1$ and $S_2$, where each subset has a larger variance than the original pooled set: $\sigma^2_{S_1} \geq \sigma^2_{S_2} > \sigma^2_S$?

2

There are 2 best solutions below

13
On BEST ANSWER

I am understanding your question to ask if such a partition exists, and not does a set exhibit this behavior for all partitions. We know that: $$ \begin{align} Var(aX + bY) &= a^2 Var(X) + b^2 Var(Y) + 2abCov(X, Y)\\ &= a^2 Var(X) + b^2 Var(Y) + 2ab\rho_{X, Y}\sigma_X\sigma_Y \end{align} $$ Now $\rho$ can be -1. So if $X$ is negatively correlated with $Y$, we can create cases where $Var(X \cup Y) \leq \min(Var(X), Var(Y))$

For example, try the following in R:

X <- seq(-4, 4)
Y <- -X
Z <- c(X, Y)
var(X); var(Y); var(Z)

You get:

[1] 7.5
[1] 7.5
[1] 7.058824

Update

If you do not allow multiple entries of the same value, try this:

X <- c(-1, 2)
Y <- -X
Z <- c(X, Y)
var(X); var(Y); var(Z)

[1] 4.5
[1] 4.5
[1] 3.333333
1
On

It is assumed that the random variables are defined as equiprobable on the subsets of $\mathbb{R}$ on which they are defined.

$$\frac{1}{|S_1|}\sum_{z\in S_1}(z-\mu(S_1))^2+\frac{1}{|S_2|}\sum_{z\in S_2}(z-\mu(S_2))^2=Var(S_1)+Var(S_2)>\\>Var(S)=\frac{1}{|S|}\sum_{z\in S}(z-\mu(S))^2=\frac{1}{|S|}\sum_{z\in S_1}(z-\mu(S))^2+\frac{1}{|S|}\sum_{z\in S_2}(z-\mu(S))^2$$

We subtract.
$$\left(\frac{1}{|S_1|}-\frac{1}{|S|}\right)\sum_{z\in S_1}(z-\mu(S_1))^2-(z-\mu(S))^2+\\+\left(\frac{1}{|S_2|}-\frac{1}{|S|}\right)\sum_{z\in S_2}(z-\mu(S_2))^2-(z-\mu(S))^2>0$$

Now, each sum is nonpositive, and each leading coefficient is positive. Hence together the left hand side is nonpositive, which is a contradiction.