Why does this series converge to the mean?

Question

Why does this series converge to the mean?

78 Views Asked by Bumbble Comm At 11 May 2026 - 2:45

I am trying to understand why a specific series converges to the following result, in the context of an optimization problem.

We are given $N$ samples of the variable $y$. We want to find an $x$ that minimizes the mean square error. Thus, the recursive equation becomes:

$$ x_t = x_{t-1} - \lambda_t (x_{t-1} - y_{i_{t-1}}) $$

where $y_{i_{t-1}}$ is the $i$'th sample $y$ that is chosen at time instant $t-1$ during this optimization.

At any rate, I am told that if we let $\lambda_{t} = \frac{1}{t}$, then the final solution $\hat{x}$ converges to:

$$ \hat{x} = \frac{1}{N}\sum_{j=1}^{N}y_j $$

What I cannot figure out, is how we attain this conclusion? Why is it necessary for $\lambda_{t} = \frac{1}{t}$ for this solution to show itself?

Thanks.

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2016-09-23 20:11:06

There's a number of ways to show this, one way is the following. Assume that $$y_n = x + \epsilon_n$$ where $\epsilon_n\sim \mathcal{N}(0,1)$ is IID.

If you observe $N$ samples of the above, you can write the log-likelihood equation you're trying to solve as:

$$ L(x^*) = -\frac{1}{2}\sum_{n=1}^N (y_n - x^*)^2$$

which implies that $$L'(x^*) = \sum_{n=1}^N (y_n - x^*) = 0 \Rightarrow \frac{1}{N}\sum_{n=1}^N y_n = x^*$$ and $$L''(x^*) = -\sum_{n=1}^N 1 = -N $$

Now, using a first-order Taylor expansion of $L'(x_N)$ around $L'(x_{N-1})$, where $x_N$ denotes the optimal $x$ at time $N$ we get:

$$L'(x_N) = L'(x_{N-1}) + L''(x_{N-1})(x_N - x_{N-1})$$

Since $x_N$ is optimal at time $N$, the LHS equals zero and we can re-arrange to show:

$$x_N = x_{N-1} + [L''(x_{N-1})]^{-1}L'(x_{N-1})$$

However, since $$L'(x_{N-1}) = \sum_{n=1}^N (y_n - x_{N-1}) = \sum_{n=1}^{N-1} (y_n - x_{N-1}) + (y_N - x_{N-1}) = (y_N - x_{N-1})$$ and $$[L''(x_{N-1})]^{-1} = -\frac{1}{N}$$

we see that $$x_N = x_{N-1} -\frac{1}{N}(y_N - x_{N-1})$$

All we've done is taken the maximum likelihood solution and rearranged it to be in recursive form. So $\lambda = t^{-1}$, in your notation, is an outcome of the MLE process. It is commonly referred to as the Fisher Information matrix, and it tells us how much more info we learn about the true value of $x^*$ when we observe a new data point

**Bumbble Comm** · Answer 2 · 2016-09-23 21:02:19

Taking $\lambda_t = 1/t$ gives $$x_t = x_{t-1} - \frac{1}{t} (x_{t-1} - y_t) = \frac{t-1}{t} x_{t-1} + \frac{1}{t} y_t. $$ In other words, $tx_t = (t-1)x_{t-1} + y_t$. Then we can show by induction that $tx_t = \sum_{j=1}^t y_j$ for $t \ge 0$.

First, it is clear that this is true for $t = 0$. Now assume for induction that $(t-1)x_{t-1} = \sum_{j=1}^{t-1} y_j$. Then by the above equation, $$tx_t = (t-1)x_{t-1} + y_t = \sum_{j=1}^{t-1} y_j + y_t = \sum_{j=1}^t y_j,$$ which is what was to be shown.

We have shown that $x_N = \frac{1}{N} \sum_{j=1}^N y_j$, assuming $\lambda_t = 1/t$. Then, by the law of large numbers, $x_N$ almost surely converges to $E(y)$ as $N \to \infty$.

Why does this series converge to the mean?

There are 2 best solutions below

Related Questions in CONVERGENCE-DIVERGENCE

Related Questions in CONVEX-ANALYSIS

Related Questions in CONVEX-OPTIMIZATION

Related Questions in RECURSION

Related Questions in MEAN-SQUARE-ERROR

Trending Questions

Popular # Hahtags

Popular Questions