How to linearize a weighted average with a decision variable?

2.4k Views Asked by At

I'm trying to model a problem in GLPK but it turned out to be non linear.

A simplified version of the model is written below. Basically it is a weighted average of a set of features of all enabled points substracting a cost associated to enabling those points, provided there are exactly P enabled points.

$\max {\sum_{i=1}^N { f_i \times e_i \times w_i } \over \sum_{i=1}^N { e_i \times w_i } } - \sum_{i=1}^N {c_i \times e_i}$

Subject to constraint $\sum_{i=1}^N e_i = P$

Where:

  • e (enabled) is a vector of either 0 or 1 (decision variable)
  • f (feature) is a vector of decimal numbers in [0,1] (precomputed)
  • w (weight) is a vector of decimal numbers in [0,1] (precomputed)
  • c (cost) is a vector of decimal numbers in [0,1] (precomputed)

The model is very simple so I'm guessing there probably is a way to linearize it or some workaround I'm not aware of.

So the questions are these.

  1. Is there a way to linearize this model so I can solve it with GLPK or similar?
  2. Do you know any (free) software I can use to solve or approximate it as a non linear model?

Here is a simple example:

N = 5

P = 2

f = (0.1, 0.6, 0.5, 0.5, 0.8)

w = (0.2, 0.3, 0.2, 0.2, 0.1)

c = (0.5, 0.5, 0.5, 0.3, 0.5)

In this scenario the result should be this:

e = (0, 0, 0, 1, 1)

2

There are 2 best solutions below

3
On

This can be linearized but with some effort. The ratio $$y=\frac{\sum_i f_i w_i x_i}{\sum_i w_i x_i}$$ with $x_i \in \{0,1\}$ can be written as a (nonlinear) constraint: $$ y \sum_i w_i x_i = \sum_i f_i w_i x_i$$ where $y$ is an additional continuous variable. The non-linear expression $(y x_i)$ is a continuous variable times a binary variable. I assume $y\ge 0$. We can now linearize $z_i=y x_i$ as: $$\begin{align} &z_i \le M x_i\\ &z_i \le y \\& z_i \ge y - M(1-x_i)\\ &z_i \ge 0\end{align} $$ Here $M$ is an upper bound on $y$. We have $M=1$ because of the values $w_i$ and $f_i$ can assume.

So putting things together we have: $$\begin{align} \max\> & y - \sum_i c_i x_i\\ & \sum_i w_i z_i = \sum_i f_i w_i x_i\\ & 0 \le z_i \le x_i \\ & y-(1-x_i) \le z_i \le y \\ &y\ge 0\\ & x_i \in \{0,1\} \end{align}$$

I cannot replicate your stated optimal solution. Your solution has:

----     30 VARIABLE obj.L                 =       -0.240  

----     30 VARIABLE x.L  original variables

i2 1.000,    i4 1.000

When I solve it, I get a better solution:

----     48 VARIABLE obj.L                 =       -0.200  

----     48 VARIABLE x.L  original variables

i4 1.000,    i5 1.000


----     48 VARIABLE y.L                   =        0.600  ratio

----     48 VARIABLE z.L  products y*z(i)

i4 0.600,    i5 0.600

The objective for $x=[0,1,0,1,0]$ is $-0.24$ while my optimal $x=[0,0,0,1,1]$ gives an objective value of $-0.2$. (Assuming no typos in transcribing the problem and data).

A similar problem is formulated here.

0
On

Hy guys, I've been struggling to code this problem with pulp (python) for the past couple of days but i finally managed, since i'm new to pulp it's not the cleanest code, bit it works. Here the final code in python for those who are interested:

import numpy as np  
import pandas as pd  
import pulp  

N = 5  
P = 2  
f = [0.1, 0.6, 0.5, 0.5, 0.8]  
w = [0.2, 0.3, 0.2, 0.2, 0.1]  
c = [0.5, 0.5, 0.5, 0.3, 0.5]
  
N = [(i) for i in range(0,N)]  
prob = pulp.LpProblem("weighted average", pulp.LpMaximize)

x = pulp.LpVariable.dicts("x",(N),cat="Binary")
z = pulp.LpVariable.dicts("z",(N),lowBound=0, upBound=None, cat='Continuous')
y = pulp.LpVariable("y",)

cx = []
for i in N:
    cx.append(x[i]*c[i])

prob += y - cx

zw = []
fwx = []
for i in N:
    zw.append(z[i]*w[i])
    fwx.append(f[i]*w[i]*x[i])

zw = sum(zw)
fwx = sum(fwx)
prob += zw == fwx

for i in N:
    prob += z[i] >= 0
    prob += z[i] <= x[i]

for i in N:
    prob += y - (1- x[i]) <= z[i]
    prob += z[i] <= y

prob += y >= 0

X = []
for i in N:
    X.append(x[i])
X = sum(X)
prob += X == P

prob.solve()
print("Status:", pulp.LpStatus[prob.status])

result = []
for v in prob.variables():
    print(v.name, "=", v.varValue)
    result.append(v.varValue)