MILP for similarity

76 Views Asked by At

I have the following question and I'm not sure how to formulate it as a mixed integer linear programming problem (if possible):

I have a set of products i (1..n) where I'm searching a similar product j (1..m) for, where X(i,j) equals 1 when product i is matched to product j and 0 otherwise. Every product i should be matched to exactly 1 product j (but not necessarily the other way around).

Now, to define a product i similar to product j, we have a bunch of attributes. Let's define for simplicity just two attributes:

a(i,j) : abs difference in % wood used between product i and j

b(i,j) : abs difference in % metal used between product i and j

Note that these are all constants. Finally, we have our absolute difference of sales volumes:

e(i,j) : abs difference in sales volumes between product i and j

Now, I want to minimize the absolute differences of sales volume, meaning:

min AVG((X(i,j)*e(i,j))

BUT, I also want to decide (extra decision variables) the weights of the attributes (call them A and B), which refer to the above mentioned constants a(i,j) and b(i,j). The restriction is that, for every product i, I always choose the product j that has the minimum of:

a(i,j) * A + b(i,j) * B

meaning, the smallest combined distances. A and B should be independent of i and j.

An example dataset is:

i|j|a|b|e
A|X|10|200|4000
A|Y|8|240|3500
A|Z|4|230|800
B|X|20|100|1200
B|Y|70|120|2000
B|Z|80|180|2000
C|X|5|420|3000
C|Y|4|350|600
C|Z|12|300|700

So, if the weights of column a and b are both 0.5, I have differences for product A of:

(A,X) : 0.5*10+0.5*200 = 105
(A,Y) : 0.5*8+0.5*240 = 124
(A,Z) : 0.5*4+0.5*230 = 117

Hence, product A will be most similar to product X, where the sales volumes difference equals 4000.

Doing this for products B and C as well, equals:

Product B most similar to product X, with sales difference of 1200; Product C most similar to product Z, with sales difference of 700.

Hence, the average sales difference is now (4000+1200+700)/3 = 1967.

This is the number I want to minimize, by choosing the right weights for the (in this case) two attributes, which in my example are set to 0.5 and 0.5.

The question is now: Which values to select for these weights, in order to minimize the average sales volume?

Thanks!

1

There are 1 best solutions below

1
On BEST ANSWER

Let binary variable $x_{i,j}$ indicate whether product $i$ is matched to product $j$, and let $d_{i,j} = a_{i,j} A + b_{i,j} B$. If you require that $A$ and $B$ are nonnegative and sum to 1, then I find that $(A,B) = (0.86207, 0.13793)$ is optimal, yielding the following values:

i j  a   b    e x      d 
A X 10 200 4000 0 36.207 
A Y  8 240 3500 0 40.000 
A Z  4 230  800 1 35.172 
B X 20 100 1200 1 31.034 
B Y 70 120 2000 0 76.897 
B Z 80 180 2000 0 93.793 
C X  5 420 3000 0 62.241 
C Y  4 350  600 1 51.724 
C Z 12 300  700 0 51.724 

The resulting objective value is $2600/3 \approx 866.7$.

The formulation I used is as follows. For all $i,j$, let $\underline{d}_{i,j} = \min(a_{i,j},b_{i,j})$ and $\overline{d}_{i,j} = \max(a_{i,j},b_{i,j})$. The problem is to minimize $\frac{1}{n}\sum_{i,j} e_{i,j} x_{i,j}$ subject to: \begin{align} A + B &= 1 \\ \sum_j x_{i,j} &= 1 &&\text{for all $i$}\\ d_{i,j} &= a_{i,j} A + b_{i,j} B &&\text{for all $i,j$} \\ d_{i,j} - d_{i,k} &\le (\overline{d}_{i,j} - \underline{d}_{i,k})(1 - x_{i,j}) &&\text{for all $i,j,k \not= j$} \\ A,B &\ge 0 \\ x_{i,j} &\in \{0,1\} &&\text{for all $i,j$} \end{align} The "big-M" constraints enforce the logical implication $x_{i,j} = 1 \implies d_{i,j} \le d_{i,k}$ for all $k \not= j$.