Recalculating exponentially smoothed value with different smoothing parameters.

50 Views Asked by At

I have a time series and I cannot keep all the data.

I want to do exponential smoothing to get a single smoothed result: $$y_{a,t} = \sum_{i=0}^t {x_{t-i}*a^i}$$

In future I might need $y_{a,t}$ with different values of the smoothing factor $a$.

I want to calculate some fixed number of values that would allow me to approximate $y_{a}$ for any $a \in [0, 1]$ with least amount of error.

What's the best way to accomplish this?

Should I just calculate $y_{a_k}$ for several values of $a_k$ (e.g. for $a_k = k/n$ where $n=10, k=0..n$) and do linear/quadratic interpolation?

1

There are 1 best solutions below

3
On

Here is an error-analysis of a nonuniform sampling scheme, which also shows how many samples are needed to ensure an error of at most $\delta>0$.

Main result

Assume data is bounded so that there is a constant $c>0$ such that $|x_i|\leq c$ for all $i \in \{0, 1, 2, ...\}$. Fix $\rho \in [0,1)$. Assume $a \in [0, \rho]$. Fix $\delta>0$. We show that error can be bounded by $\delta$ (for all iterations $t$) using a number of samples $r$ given by: $$ \boxed{r = \left\lceil \frac{(c/\delta)\rho}{1-\rho}\right\rceil} $$

Discretization

Let $\{a_k\}_{k=0}^{\infty}$ be a sequence such that
$$0=a_0 < a_1 < a_2 < ... < 1 $$ We shall keep exponential averages for $a_0, ..., a_r$, where $\rho\leq a_r<1$. (We can efficiently implement this using the recursive scheme in my first comment above). We shall choose the $a_k$ values appropriately (with nonuniform spacing) to ensure the error is always at most $\delta$, then see how many samples $r$ are needed to ensure $a_r\geq \rho$ (so we can handle the entire interval $[0,\rho]$).

Error for $a \in [a_{k-1}, a_k)$

Suppose $a \in [a_{k-1}, a_k)$ for some $k \in \{1, 2, ...\}$. The error between using weight $a_{k-1}$ and weight $a$ over $t$ iterations is: \begin{align} \left|\sum_{i=0}^t a^i x_{t-i} - \sum_{i=0}^t a_{k-1}^i x_{t-i} \right| &=\left|\sum_{i=0}^t (a^i-a_{k-1}^i)x_{t-i}\right| \\ &\leq \sum_{i=0}^t |x_{t-i}|(a^i - a_{k-1}^i) \\ &\leq c\sum_{i=0}^{\infty} (a_k^i - a_{k-1}^i)\\ &= \frac{c}{1-a_k} - \frac{c}{1-a_{k-1}} \\ &= \frac{c(a_k-a_{k-1})}{(1-a_{k-1})(1-a_k)} \end{align} Fix $\delta>0$. Knowing $a_{k-1}$, we now choose $a_k >a_{k-1}$ to ensure the error is at most $\delta$. We need: $$ \frac{c (a_k-a_{k-1})}{(1-a_{k-1})(1-a_k)} = \delta \quad (Eq. 1)$$ So $a_k = a_{k-1} + (\delta/c)(1-a_{k-1})(1-a_k)$. This yields: $$ a_k = \frac{a_{k-1} + (\delta/c)(1-a_{k-1})}{1+(\delta/c)(1-a_{k-1})}$$ and it holds that $a_{k-1} < a_k < 1$.

Number of samples needed

How many samples $r$ are needed to ensure $a_r\geq \rho$? Define $z_k = 1-a_k$ for $k \in \{0, 1, 2, ...\}$, so $1=z_0>z_1>z_2>...> 0$. Equation 1 implies:
\begin{align} &\frac{c(z_{k-1} - z_k)}{z_{k-1}z_k} = \delta \\ &\implies \boxed{z_k = \frac{z_{k-1}}{1+(\delta/c) z_{k-1}} \quad \forall k \in \{0, 1, 2, ...\}} \end{align}

Claim: The above iteration yields: $$ z_k = \frac{(c/\delta)}{k + (c/\delta)}\quad , \forall k \in \{0, 1, 2, ...\}$$

Proof: We use induction. It holds for $z_{k-1}$ with $k-1=0$ since $z_0=1$. Suppose it is true for some $k-1\geq 0$. We prove it holds for $k$. Then: \begin{align} z_k &= \frac{z_{k-1}}{1 + (\delta/c)z_{k-1}} \\ &= \frac{\frac{(c/\delta)}{k-1+ (c/\delta)}}{1 + \frac{1}{k-1 + (c/\delta)}}\\ &= \frac{c/\delta}{k + (c/\delta)} \end{align} $\Box$

So we just use $r$ samples, where $r$ is the smallest integer such that: \begin{align} \frac{(c/\delta)}{r + (c/\delta)} \leq 1-\rho \implies r = \left\lceil \frac{(c/\delta)\rho}{1-\rho}\right\rceil \end{align}

In view of the above equality for $z_k$, it holds that $$ \boxed{a_k = 1 - \frac{(c/\delta)}{k + (c/\delta)} \quad, \forall k \in \{0, 1, 2, ...\}} $$