My problem can be stated as follows:
I have a vector space in which vectors live, these vectors reflect events in real life, and their components reflect characteristics of these events. What I want to do is find a distance function which measures similarity between events. I could simply use euclidean distance, but in my particular problem, different dimensions 'matter more' in comparing events with each other (i.e. they are more salient).
An event is for example denoted by a vector $[2, 5, 11, 82, 23]$, an outcome is denoted by another vector in which the first two components refer to classes (1-8 possibilities for the first element, and 1-3 possibilities for the second element), and the third refers to a continuous value, i.e. $[8, 2, 24.5]$.
So, I have many millions of such event vectors (in reality, the event vectors have much more components) and corresponding 'outcome vectors'.
First question: how can I somehow generate a distance function to compare event vectors, in which different dimensions have different weights?
Now, to make things even more difficult: what if two dimensions with a very low weight individually do not matter much, but if they both have unusual values (for example, they both have low values, or they both have high values, or one high and one low - this can differ among pairs of dimensions), these dimensions are extremely significant.
So, in other words, using my same example; let's say that the fourth and fifth component of my event vector 'do not really matter'; when comparing different event vectors with the distance function I am trying to find (see first question), their values only have a very small impact on the calculated distance between two event vectors (the similarity of two events). However, if they both have a certain type of value, they matter much more than the three preceding dimensions. I was thinking about creating a matrix of correlation values for every pair of dimensions, and also calculate a weight for each pair.
If my question is still unclear, then yet in other words:
I have a multi-dimensional vector space with many millions of 'event vectors' $[c_1, c_2, c_3 ... c_i]$. All of them have a respective outcome vector which has much less dimensions: $[o_1, o_2, o_3]$. In my particular problem, $o_1$ and $o_2$ correspond to class indices (with 8 and 3 classes respectively) and $o_3$ is a continuous number. What I want to do is to construct a distance function which measures the similarity between events. Each dimension should be weighted differently in the distance function. So it could be that if the values of the $c_1$ - components of two vectors are very close to each other, these events are likely to be very similar, and $c_2 ... c_i$ matter much less. What approach should I take to calculate a distance function which measures similarity on such way? It could also be that $o_1, o_2, o_3$ have different degrees in how much they distinct different outcomes.
In my particular problem, it happens that two components, $c_j$ and $c_k$ have a very low weight in the distance function (they can almost be neglected when comparing two vectors with each other, their values are very insignificant). However, if they both have a certain type of value (i.e. they both have very low values; both have very high values; or one has a low, and one has a high value - this depends on which pair of dimensions we are considering) they are extremely significant. So, if normally their individual values almost would not matter at all, if they both have a certain type of value occurring together, their weight should be much higher than the highest individual dimension!