Finding Objects Most Similar to Other Objects as a "Linear Combination"

31 Views Asked by At

I have a small collection of objects, $\{o_i\}_{i=1}^{20},$ that have several properties (all quantitative, or can be made to be quantitative) that differ among themselves. So $o_1$ has, say, $10$ properties, $\{p_{1j}\}_{j=1}^{10},$ and something similar for the other $19$ objects. I have $5$ more objects that are of the same general kind as the original $20,$ but have yet different properties. Unfortunately, these properties are not even the same categories among all $25$ total objects. Some of the properties are common to all the objects, and some are not.

There is one important property, I'll call "usage", that is common to all $25$ objects. We'll let the usage of object $i$ be written as $u_i$. Note that $u_i\ge 0\;\forall\,i.$

What I would like to do is find a way to write each of the $5$ new usages as a linear combination of the $20$ original object usages, with objects among the $20$ that are more similar contributing more.

For example, let's take one of the new objects, $o_{21}$. I would like to write $$u_{21}=\sum_{i=1}^{20}a_i u_i,\qquad 0\le a_i\le 1\;\forall\,i,\qquad \text{s.t.}\;\sum_{i=1}^{20}a_i=1.$$ Now suppose that $o_7$ was the most similar to $o_{21}:$ then $a_7$ should be larger than all the other $a_{i}$'s. It is not important that this "linear combination" allow me to predict the properties of $o_{21}.$ It's only important that $\sum_{i=1}^{20}a_i=1$ and that objects more similar to $o_{21}$ have correspondingly larger $a_i$'s.

The target variables here are, for each of the $5$ new objects, the $a_i$'s that satisfy the above criteria. The $u_i$ are known for the $20$ original objects, and unknown for the $5$ new objects, so the $u_i$ are also target variables for the new objects. However, as knowing the $a_i$ will determine the $u_i$ for the new objects, the immediate goal for this question is to find the $a_i$'s.

Now it's not too difficult to find out which of the $20$ original objects are "closest" to, say, $o_{21}:$ normalize the common quantitative categories, drop the ones not common to all, and use the Euclidean distance norm. (I have no a priori notion of which properties might be more important than others, so I'd rather treat them all on an equal footing.) If I divided each of these distances by the sum total of all the distances, I would get the opposite of what I want: the "closer" objects would have correspondingly smaller $a_i$'s.

So it comes down to this question: what sort of function would be good to switch this around, so that the closer objects get larger $a_i$'s? Subtract each distance from the maximum distance and then normalize?