How to find the smoothest average curve?

123 Views Asked by At

I have an array structure $S$ of length $n$ in each element $S\{i\}$ there are $k$ scalars saved, and $k \in [1,10]$. So $k$ can be different for each element $S\{i\}$. There is another array $v$ assumed the average of $S$ such that $$v\{i\}=mean(S\{i\})$$

What i want is to make array $v$ as smooth as possible regarding its fluctuations when you start from $i=1$ till $i=n$, but still $v$ should be a good estimation as the average of $S$ they way i defined above.

I think i need to define an optimization framework with 2 objectives, one aiming for best estimate for average of $S$ and the other to make $v$ a smooth curve.

For the smoothness i guess i need to work on the 2nd derivative, and doing something like cubic spline estimation, but i'm not sure how to formulate the optimization framework properly!!

  • Also, is there any other way to make a curve smooth rather than doing cubic spline optimization?

Thank you!

1

There are 1 best solutions below

0
On

So, you have $n$ values $v_1, \ldots, v_n$. The fact that they were calculated by averaging is not relevant, it seems to me. You want to construct some real-valued function $f:[1,n] \to \mathbb{R}$ such that $f(i) \approx v_i$ for $i=1,2,\ldots,n$, and $f$ is s"smooth" in some sense.

If you don't care about the form of $f$, there there are many choices.

  1. You could just use $f = \text{constant}$. A good value for the constant would be the mean of $v_1, \ldots, v_n$.
  2. You could use a straight line $f(x) = ax+b$. Then you'd be doing simple linear least-squares fitting. The values of $a$ and $b$ can be obtained by solving linear equations.
  3. You could use a low-degree polynomial. Choose the degree to match the shape of the data $v_1, \ldots, v_n$. Least-squares fitting with a polynomial is also very simple. If you choose the degree of the polynomial to be $11$, then it will exactly interpolate the values $v_1, \ldots, v_n$, but it might wiggle around a lot, too. A lower degree would work better.

You can easily experiment with all of these if you have access to Excel. Just enter your data, and use the "Insert Trend Line" function. It supports various kinds of fitting functions.

To get a balance between fit and smoothness, you minimise $$ \sum_{i-1}^n\big[ f(i)-v_i \big]^2 + \lambda E(f) $$ where $E(f)$ is some measure of the "energy" of $f$. Choose $\lambda$ small (or zero) to improve fitting, and choose $\lambda$ large to improve smoothness. There are many options for this energy term. One common one is $$ E(f) = \int f''(x)^2 \,dx $$ More details here. The article discusses the case where $f$ is a cubic spline, but similar techniques will work with other families of approximating functions.