Minimizing squared error between several datasets

660 Views Asked by At

I'm just starting to get back into math for some computer programs i am writing and I've run into a complex regression-like problem. Its been a long time since grade school and i don't even know which field of mathematics to look into for this. Here's the deal:

I have 5 datasets. Each is just a set of magnitude measurements taken at N regular intervals of time. I also have what i will call an "ideal" dataset of the same type. What i would like to do is modify the 5 "regular" datasets so that when I add their magnitudes together to produce a new dataset of N points, the resulting dataset most closely resembles the "ideal" dataset. The only way i can modify each of the 5 regular datasets however is by adding a constant value to all of the magnitudes. In other words, i can only move each entire dataset up or down. This seemed similar to finding a regression line to me, where you find the line that minimizes the squared error for all the points, but i am not sure how to apply it to this case.

1

There are 1 best solutions below

1
On BEST ANSWER

Suppose your "ideal" values are $z_1, \ldots, z_n$. Also, for each $i$, let $s_i$ be the sum of the 5 data values that you want to adjust ("$s$" for "sum"). If we add an adjustment quantity $k$ to each data value, then $s_i$ will become $s_i + 5k$. We want the adjusted values $s_i + 5k$ to be close to the $z_i$ values in the least squares sense.

So, we want to choose a value of $k$ that minimises the function $$ f(k) = \sum_{i=1}^n \{z_i - (s_i + 5k)\}^2 $$ There are various ways to minimize $f(k)$. One way is to calculate the derivative of $f(k)$ with respect to $k$, and set this equal to zero. If you do, this, you get $$ \frac{df}{dk} = \sum_{i=1}^n \{z_i - s_i - 5k)\} = 0 $$ Solving this for $k$ gives $$ k = \frac1{5n}\sum_{i=1}^n \{z_i - s_i\} $$ This makes sense intuitively, I think. The $k$ value that you should add is one fifth of the average of the differences between the $z$ values and the $s$ values.