Find variance of a dataset when a new element is added, using mean and variance of old dataset (n-1)

89 Views Asked by At

Assuming we have mean $\bar{x}_{n-1}$ and variance $\sigma^2_{n-1}$ for some dataset $\mathscr{D}_{n-1}$ with $n-1$ samples. What would be the variance $\sigma^2_{n}$ if we add a new element $x_{*}$ to the dataset (assuming you have computed the new sample mean $\bar{x}_{n-1}$)?


From a previous exercise, I know that:

$$ \bar{x}_{n}=\bar{x}_{n-1} + \frac{1}{n}(x_{*} - \bar{x}_{n-1}) $$

but I'm having trouble plugging this into:

$$ \sigma^2_{n} = \frac{\sum_{n=1}^{N}(x_n - \bar{x}_{n})^2}{N} $$

I think the following two equations are true:

$$ \begin{equation} \label{eq1} \begin{split} \sigma^2_{n-1} & = \frac{\sum_{n=1}^{N-1}(x_{n-1} - \bar{x}_{n-1})^2}{N-1} \\ \\ \sigma^2_{n} & = \frac{\sum_{n=1}^{N}(x_n - \bar{x}_{n-1} - \frac{1}{n}(x_{*} - \bar{x}_{n-1}))^2}{N} \\ \end{split} \end{equation} $$ Apparently the answer is: $$ \sigma^2_{n} = \frac{n-1}{n}\sigma^2_{n-1} + \frac{1}{n-1}(x_{*}-\bar{x}_{n-1})(x_{*}-\bar{x}_n) $$

But not sure how to get there. Would somebody mind helping me understand?

1

There are 1 best solutions below

0
On BEST ANSWER

For convenience I will let $x_1, \ldots, x_{n-1}$ denote the elements $\mathscr{D}_{n-1}$, and let $x_n$ denote the new element $x_*$.

We have \begin{align} \sigma_n^2 &=\frac{1}{n}\sum_{i=1}^n (x_i - \bar{x}_{n-1} - \frac{1}{n}(x_n - \bar{x}_{n-1}))^2 \\ &= \frac{1}{n} \sum_{i=1}^n \left[ (x_i - \bar{x}_{n-1})^2 - \frac{2}{n}(x_n - \bar{x}_{n-1})(x_i - \bar{x}_{n-1}) + \frac{1}{n^2}(x_n - \bar{x}_{n-1})^2 \right] & \text{expand the square} \\ &= \frac{1}{n} \sum_{i=1}^n (x_i-\bar{x}_{n-1})^2 - \frac{2}{n^2}(x_n - \bar{x}_{n-1}) \sum_{i=1}^n (x_i - \bar{x}_{n-1}) + \frac{1}{n^2} (x_n - \bar{x}_{n-1})^2. \end{align} Using the fact that $\sigma_{n-1}^2 = \frac{1}{n-1} \sum_{i=1}^{n-1} (x_i - \bar{x}_{n-1})^2$ and $\sum_{i=1}^{n-1}(x_i - \bar{x}_{n-1}) = 0$ (note that the sums here are $\sum_{i=1}^{n-1}$ not $\sum_{i=1}^n$), we can simplify the above as \begin{align} \sigma_n^2 &= \frac{n-1}{n} \sigma_{n-1}^2 + \frac{1}{n}(x_n - \bar{x}_{n-1})^2 - \frac{2}{n^2}(x_n - \bar{x}_{n-1})^2 + \frac{1}{n^2} (x_n - \bar{x}_{n-1})^2 \\ &= \frac{n-1}{n} \sigma_{n-1}^2 + \frac{n-1}{n^2}(x_n - \bar{x}_{n-1})^2. \end{align}

To rewrite the second term as $\frac{1}{n-1}(x_n - \bar{x}_{n-1})(x_n - \bar{x}_n)$, note that \begin{align} (x_n - \bar{x}_{n-1})(x_n - \bar{x}_n) &= (x_n - \bar{x}_{n-1})(x_n - \bar{x}_{n-1} - \frac{1}{n}(x_n - \bar{x}_{n-1})) \\ &=(x_n - \bar{x}_{n-1})^2 - \frac{2}{n}(x_n - \bar{x}_{n-1})^2 + \frac{1}{n^2}(x_n - \bar{x}_{n-1})^2 \\ &= \frac{(n-1)^2}{n^2} (x_n - \bar{x}_{n-1})^2. \end{align}