I am trying to develop a methodology for solving a constrained optimisation problem I am facing, I would appreciate any help/pointers on how best to frame this problem so that it can be coded up efficiently.
Consider a data set with a unique identifier column along with other variables (you can think of this as a data table/matrix). There are numerous (not necessarily linear) constraints that need to be satisfied on the aggregate level on this data set, and I can remove ids from my data set until these constraints are satisfied on the aggregate level.
I would like to remove ids (or select ids) such that an objective function is maximised (or minimised) - the idea is to have the option to use different objective functions at this stage which is why I have not specified one.
Simplified Example
Below, I have included an example of the scenario I have described above - the context/constraints here are meaningless but should demystify what I am trying to achieve hopefully.
- Consider a data set with 5 columns: id, a, b, c and d. Each id has an associated a, b, c and d variable.
- Constraints: Overall mean of column a must be <=15, Sum(c)/total_number_ids >= 50%.
- Objective Function: Some aggregation of the a,b,c and d (and potentially others) variables - in practice this could be fairly complex.
- Task: Remove (or include) ids from the data set such that we maximise (or minimize) the objective function whilst ensuring that on the aggregate the constraints are met.
The end goal to essentially have a list of ids that are to be removed (or included).
Some help on how to frame this problem mathematically and which methods could be used from an optimisation perspective would be greatly appreciated. I have a lot of flexibility in the methods I can use but I would be keen to explore efficient ideas.
Introduce a binary variable $x_i$ to indicate whether row $i$ is included. Your first constraint becomes $$\frac{\sum_i a_i x_i}{\sum_i x_i} \le 15,$$ which (assuming $\sum_i x_i \ge 1$) you can linearize as $$\sum_i a_i x_i \le 15 \sum_i x_i,$$ equivalently, $$\sum_i (a_i-15) x_i \le 0$$
Not sure exactly what you meant by your second constraint, but maybe you want this: $$\frac{\sum_i c_i x_i}{\sum_i c_i} \ge 0.5$$