Help defining objective function and constraints to minimize standard deviation of list of points

337 Views Asked by At

Situation: I am trying to minimize the standard deviation between a series of points of differing heights in a list with the constraint that each point in the list can be raised anywhere from 0 to 2 units in order to minimize the standard deviation between points.

Example: I have a list of points which are equidistant on the x axis but not the y axis.

h = [20, 24, 28, 24 ,20 ,18, 20, 32 ,30, 28, 20 ,24]

Where each number in the list represents that point's height.

I also have the constraint that each point in the list can be raised by a c value anywhere from 0 to 2 in order to help achieve a smaller standard deviation.

I am trying to create an algorithm that does the optimization of minimizing the standard deviation of the points of h with the constraint that each point in h can be raised by 0 <= c <= n for an h of any length with any values and with any n > 0

I am very new to optimization problems and although I have seen problems that look similar to my question, I have not seen any that I've been able to gather enough information to help push me further towards an answer.

If possible, I was hoping someone would help me define the objective function, constraints, and other necessary functions that would lead me to an answer.

This is not a homework problem so therefore I have no course material to help guide me to an answer. The only guidance I have is from the comments and answers to this post. Please understand that I am in no way a mathematician so I really need all the help I can get. Thanks!

1

There are 1 best solutions below

4
On

You want to choose $c_1, \ldots, c_n$ to minimize $$\sum_{i=1}^n (x_i + c_i - (\bar{x} + \bar{c}))^2, \qquad \text{s.t. $0 \le c_i \le 2$}$$ where $\bar{x} = \frac{1}{n} \sum_{j=1}^n x_j$ and $\bar{c} = \frac{1}{n} \sum_{j=1}^n c_j$.

We can use the KKT conditions to solve for $c_i$.

The Lagrangian is $$\sum_{i=1}^n (x_i + c_i - (\bar{x} + \bar{c}))^2 + \sum_{i=1}^n \lambda_i c_i + \sum_{i=1}^n \mu_i (2 - c_i)$$ where $\lambda_i \le 0$ and $\mu_i \le 0$.

The partial derivative with respect to $c_j$ is $$2(x_j + c_j - (\bar{x} + \bar{c})) - \frac{2}{n} \underbrace{\sum_{i=1}^n(x_i + c_i - (\bar{x} + \bar{c}))}_{=0} + \lambda_j - \mu_j = 2(x_j + c_j - (\bar{x} + \bar{c}) + \lambda_j - \mu_j,$$ and the stationarity condition implies this should be zero. The complementary slackness condition is $\lambda_j c_j =0$ and $\mu_j (2-c_j)=0$ for all $j$.

So,

  • In cases where $0 < c_j < 2$ we must have $x_j + c_j - (\bar{x} + \bar{c}) = 0$.
  • In cases where $c_j = 0$, we have $x_j + c_j - (\bar{x} + \bar{c}) = - \lambda_j \ge 0$.
  • In cases where $c_j = 2$, we have $x_j + c_j - (\bar{x} + \bar{c}) = \mu_j \le 0$.

This is intuitive: we want $x_j + c_j$ to be as close to the mean $\bar{x} + \bar{c}$ as possible. If it is within reach by adjusting $c_j$ within $[0, 2]$, then it does so, but if it is too far from the mean, then it moves $c_j$ as far as it can to $0$ or to $2$.

Will update this answer if I find a nice way to state an explicit solution $c_1, \ldots, c_n$...