Forcing $(x_1 - x_2) - (x_2 - x_3) \in [-59, 59]$

138 Views Asked by At

Given the following list of numbers:

456
386
363
402
357
323
329
354
279
279
305
340
351
409
478
543

For example, for this list, the first calculation would be

$$(456-386)-(386-363) = \color{blue}{47} \in [-59, 59]$$

This is in tolerance so no changes are needed but as we go down the list lets say to

$$(363-402)-(402-357) = \color{red}{-84} \notin [-59, 59]$$

I need help finding a way to know which of these three numbers needs to be changed so that this number is at least in tolerance. It should use minimal changes. Also needs to find a way to not affect previous calculations.

2

There are 2 best solutions below

3
On BEST ANSWER

You can solve the problem via quadratic programming as follows. For $i\in\{1,\dots,n\}$, let $x_i$ be the given values, and let $y_i$ be the adjusted values. The problem is to minimize $\sum_{i=1}^n (y_i-x_i)^2$ subject to linear constraints \begin{align} -59 \le y_i - 2 y_{i+1} + y_{i+2} &\le 59 && \text{for $i\in\{1,\dots,n-2\}$} \\ y_i &\le x_i \end{align}

The resulting optimal solution has $y_i=x_i$ for all $i$ except the following: \begin{matrix} i & x_i & y_i \\ \hline 4 & 402 & 389.5 \\ 8 & 354 & 333.5 \end{matrix}


By request, here is the SAS code I used:

data indata;
   input x @@;
   datalines;
456 386 363 402 357 323 329 354 279 279 305 340 351 409 478 543
;

proc optmodel;
   /* declare parameters and read data */
   set OBS;
   num x {OBS};
   read data indata into OBS=[_N_] x;

   /* declare variables */
   var y {i in OBS} <= x[i];

   /* declare objective */
   min SSE = sum {i in OBS} (y[i] - x[i])^2;

   /* declare constraints */
   con Linear {i in OBS: i+2 in OBS}:
      -59 <= y[i] - 2*y[i+1] + y[i+2] <= 59;

   /* call QP solver */
   solve;

   /* print input data and optimal solution */
   print x y;
quit;

By request, here is Python code:

import numpy as np
from qpsolvers import solve_ls

# minimize 0.5*(Ry-x)_2^2 subject to Gy <= h and y <= x
x = np.array([456, 386, 363, 402, 357, 323, 329, 354, 279, 279, 305, 340, 351, 409, 478, 543])
tol = 59
n = len(x)
R = np.identity(n)
G = np.array([[(j in (i,i+2))-2*(j==i+1) for j in range(1,n+1)] for i in range(1,n-1)])
G = np.vstack((G, -G))
h = np.empty(2*(n-2))
h.fill(tol)

y = solve_ls(R, x, G, h, ub=x, solver="osqp", initvals=x)
print(y)
1
On

You are dealing with the second difference. Define $y_n = x_{n+1}-x_n$. So $x_n = y_{n-1} + x_{n-1}$. Iterating,

$$x_n = y_{n-1} + y_{n-2} + x_{n-2} = ... = x_1 + \sum_{k=1}^{n-1} y_k$$ The second difference is defined by $z_n = y_{n+1}-y_n$, and as above $$y_n = y_1 + \sum_{k=1}^{n-1} z_k$$ And in turn $$\begin{align}x_n &= x_1 + \sum_{i=1}^{n-1}y_i \\&= x_1 + \sum_{i=1}^{n-1} \left(x_2 - x_1 + \sum_{k=1}^{i-1}z_k\right)\\ &= (n-1)x_2 - (n-2)x_1 + \sum_{i=1}^{n-1}\sum_{k=1}^{i-1}z_k\\&= nx_2 - (n-1)x_1 +\sum_{k=1}^{n-2}\sum_{i=k+1}^{n-1}z_k\\&= nx_2 - (n-1)x_1 +\sum_{k=1}^{n-2}(n-k-1)z_k\end{align}$$

So, if you calculate the sequence of second differences, you can completely reconstruct the original sequence if you know its first two values.

How does this apply to your problem?

  • go through the entire set of numbers, calculating all the second differences.
  • when you find any out of range, change that second difference to bring it to the nearest limit of the allowed range.
  • reconstruct the original sequence from the second differences and the first two values (alternatively, you could change it to be calculated from the first and last values).

Depending on how exactly you define "minimal changes", this will be the minimal solution.

Example: $$\begin{array}{rrrrrr} \text{Data}&\text{1st Diff}&\text{2nd Diff}&\text{Lim. 2nd Diff}&\text{Lim. 1st Diff}&\text{Lim. Data}\\\hline 456&-70&47&47&-70&456\\ 386&-23&62&59&-23&386\\ 363& 39&-84&-59&36&363\\ 402&-45&11&11&-23&399\\ 357&-34&40&40&-12&376\\ 323& 6&19&19&28&364\\ 329& 25&-100&-59&47&392\\ 354&-75&75&59&-12&439\\ 279& 0&26&26&47&427\\ 279& 26&9&9&73&474\\ 305& 35&-24&-24&82&547\\ 340& 11&47&47&58&629\\ 351& 58&11&11&105&687\\ 409& 69&-4&-4&116&792\\ 478& 65&&&112&908\\ 543&&&&&1020\end{array}$$

This is what I get using exactly the approach outlined above. If instead of preserving the first two values of the data, I choose to preserve the first value and the final value - at least, as close as I can while maintaining all integers - then I get this as the adjusted data: $$\begin{array}{rrrrrr} \text{Data}&\text{1st Diff}&\text{2nd Diff}&\text{Lim. 2nd Diff}&\text{Lim. 1st Diff}&\text{Lim. Data}\\\hline 456&-70&47&47&-102&456\\ 386&-23&62&59&-55&354\\ 363& 39&-84&-59&4&299\\ 402&-45&11&11&-55&303\\ 357&-34&40&40&-44&248\\ 323& 6&19&19&-4&204\\ 329& 25&-100&-59&15&200\\ 354&-75&75&59&-44&215\\ 279& 0&26&26&15&171\\ 279& 26&9&9&41&186\\ 305& 35&-24&-24&50&227\\ 340& 11&47&47&26&277\\ 351& 58&11&11&73&303\\ 409& 69&-4&-4&84&376\\ 478& 65&&&80&460\\ 543&&&&&540\end{array}$$

To accomplish this, instead of just bringing over the $-70$ of the 1st difference like before, I chose the value to make the sum of all the first differences to be $87$, as it is in the original. But the value was $-101.8$, so I rounded it to the nearest integer, resulting in a small change in the last data point.