Update average of a series, one element at the time

100 Views Asked by At

I apologize in advance if I don't use the correct terms, I am not a mathematician but I'm learning.

I have a list of n values x1 ... xn ranging from 0 to 255

I want to calculate the average of x.

BUT I have one restriction: I need to do it in sequential steps because I can only use one xi at any given time.

So my approach is the following (I show it also in this spreadsheet)

  1. the first 'average' is just the first number: A1 = x1
  2. the second average is A2 = (A1 + x2)/2
  1. the third average is A3 = (A2 + x3)/2
  2. and so on A4 = (A3 + x4)/2
  3. and so on A5 = (A4 + x5)/2

Then I calculate the 'total average', i.e., A = average of A1 ...An

This gives me values that are very close to the 'real' average (average of [x1 ...xn]) and the differences are just in the decimal part, but it is always very very close, for example 126.03 ('real') vs 126.02 ('rolling average'). Sometimes, depending on the random numbers I am doing the test with, I get exactly the same value (up to 2 decimals, which is all I care for now). But most of the time there is a slight difference. Why?


My questions are

  1. what is the name of what I am doing? can you give me a couple of keywords so I can research a bit more?
  2. why the differences are dependent on the specific numbers in x? (I try with a simple spreadsheet, you can see it here, can be opened with Excel and LibreOffice as well)
  3. can you help me obtain a formula that expresses this procedure in a more compact way?

Edit as others have commented, there are other near-duplicates which are much more elegantly written and are also very nicely answered. If admins consider this question should be erased, please do so. Thanks to everybody that answered!

1

There are 1 best solutions below

1
On BEST ANSWER

We can definitely get a nice formula for this procedure. First, we can find an explicit formula for each $A_j$ in terms of the $x_i$. $$ A_j = \frac{x_1}{2^j} + \sum_{i=1}^j \frac{x_i}{2^{j + 1 - i}} $$ Then we sum this over all $j$ to get $n A$: \begin{eqnarray} nA = \sum_{j=1}^nA_j &=& \sum_{j=1}^n\frac{x_1}{2^j} + \sum_{j=1}^n\sum_{i=1}^j\frac{x_i}{2^{j + 1 - i}} = \left(1-\frac{1}{2^n}\right)x_1 + \sum_{i=1}^n\sum_{j=i}^n\frac{x_i}{2^{j + 1 - i}} \\ &=& \left(1-\frac{1}{2^n}\right)x_1 + \sum_{i=1}^{n}\left(1-\frac{1}{2^{n+1-i}}\right)x_i \\&=& \sum_{i=1}^n x_i + x_1 - A_n. \end{eqnarray} where we have used the identity $\sum_{m=a}^b2^{-n} = 2^{1-a}-2^{-b}$ a few times. This lets us find the error of this rolling average from the true mean $\mu$: $$ A - \mu = \frac{x_1 - A_n}{n}. $$ Since $x_n$ has the largest influence on $A_n$, this roughly says that the rolling average will differ from the true average most when $x_1$ and $x_n$ are very different. We also see that $A$ converges to $\mu$ in the limit of large $n$, though the $O(n^{-1})$ convergence is pretty slow.

As an aside, the outsize influence of $x_1$ on the result suggests maybe we should throw out $A_1$ from our average. Call this $A'$. We have $$ A' = \frac{1}{n-1}\sum_{i=1}^n x_i - \frac{A_n}{n-1} = \mu +\frac{\mu-A_n}{n-1} $$ Since $A_n$ is a weighted average of all the $x_i$, it's more likely to be close to $\mu$ than it is to $x_1$, meaning this error term should usually be smaller than that of $A$. $A'$ clearly converges to $\mu$ in the limit of large $n$, but I'm not sure if it converges faster than $A$ does.