Any way to calculate approximate SD without keeping list of numbers? Using only the mean + mean SD?

Question

Any way to calculate approximate SD without keeping list of numbers? Using only the mean + mean SD?

94 Views Asked by Bumbble Comm At 29 Mar 2026 - 3:37

So this is really a programming problem, but I thought it best to ask here.

I'm trying to calculate the Standard Deviation for a massive list of numbers (I get a new number every second), but I'd rather not have to keep the list of numbers in memory + go through the entire list each time I acquire a new number.

I've figured out a way to do this for the mean:

newMean = (currentMean + newNumber/currentCount) / (1 + 1/currentCount)

Basically I only need to store two things: the average + the count of numbers. I don't have to worry about keeping the list of numbers, and can work off the new number alone.

Is there something similar I can do with the SD? It doesn't have to be 100% accurate, but could be a rough approximation.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 29 Mar 2017 - 4:53

You can certainly do this; just requires keeping track of a few pieces of data, rather than a single number.

I'd start by rewriting $$ \begin{align*} \frac{1}{n-1}\sum_{k=1}^{n}(x_k-\bar{x})^2&=\frac{1}{n-1}\left[\sum_{k=1}^{n}x_k^2-2\bar{x}\sum_{k=1}^{n}x_k+n(\bar{x})^2\right] \end{align*} $$ Note that to compute this, you just need to know three quantities: $n$, $\sum x_k$, and $\sum x_k^2$.

So, to do this in a large-data context, you could think of taking each $x_k$, mapping it to the triple $(1,x_k,x_k^2)$, and then adding all of these triples. This can be done in a streaming manner without ever having to store a list of values.

Then, if your final aggregate triple is $(n, \sigma_1,\sigma_2)$, you can compute the variance as $$ \frac{1}{n-1}\left[\sigma_2-\frac{(\sigma_1)^2}{n}\right] $$

**Bumbble Comm** · Accepted Answer

There is a standard estimator for the variance, traditionally called the sample variance, and equal to $\frac{1}{n-1} \sum_{k=1}^n \left (x_k-\overline{x} \right )^2$. This is unbiased among some other nice properties. It is impossible to implement this exact estimator in a "running" fashion (without storing the $x_i$ to sum them all at the end).

There is a "running variance" estimator, however, which does not need all the $x_k$ at once. To compute it, you define the "running mean" $m_k$, which is exactly what you mentioned in the question: it is defined by the recurrence $m_{k+1}=\frac{km_k+x_{k+1}}{k+1}$ for $k \geq 0$ with the initial condition $m_0=0$. Then you sum up $(x_k-m_k)^2$ and divide by $n-1$ whenever you stop. This estimator is more badly behaved than the sample variance. In particular it is biased and the bias is somewhat serious (if I remember correctly, the expectation deviates from the true variance by something like $\log(n)/n$). But it is still consistent. See https://www.johndcook.com/blog/standard_deviation/ (the first thing I found on a quick Google search) for details.

Any way to calculate approximate SD without keeping list of numbers? Using only the mean + mean SD?

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in NUMERICAL-METHODS

Related Questions in AVERAGE

Related Questions in STANDARD-DEVIATION

Related Questions in MEANS

Trending Questions

Popular # Hahtags

Popular Questions