Metrics for the similarity of two sets of data

54 Views Asked by At

I am trying to model a certain (discrete) behavior measured from source A, and the literature in the field have a model for a source A'.

The behavior itself for sources A and A' are pretty similar in the shape, but not in the absolute value.

Example

In this figure, the plot in blue (G1) is the behavior measured for source A, in green (G2) the simulated behavior using our model for source A; and in red (G3) the simulated value for source A'.

My objective is to show that the shape of our model have is a better approximation for this behavior than the previous model. (It is quite clear to human eyes but I'd like to have some metric to validate our claim). Problems are:

  • The measures of behavior for source A' are unavailable
  • An histogram using Mean Square Error or even Mean Absolute Error in relation to the measures to A is not a fair metric, since the sources are different

At moment, I am using a scatterplot of Model x Measure, disregarding the slope and just taking in consideration the Pearson correlation, since models in both cases should grow/decrease at the similar points. Therefore, I am arguing that the greater the correlation, greater is the similarity between the model and the measure.

However, this is kinda indirect measure. Is there a more direct way to do this comparison?

Any help or comment is highly appreciated

1

There are 1 best solutions below

0
On

The metric you choose should be based on the goals of your task but an overall excellent one for this general task is the mutual information:

$$I(X,Y) = \sum\limits_{y\in Y} \sum\limits_{x \in X} p(x,y) \log \left( {p(x,y) \over p(x) p(y)} \right) ,$$

and measured in bits (or nats).

This measures how much information about $Y$ is provided by $X$. If $Y = X$, then if you know $X$ you get no new information about $Y$ by measuring $Y$... you already have it already in $X$. If, however, $X$ and $Y$ differ significantly, once you know $X$, you get a lot of information by then measuring $Y$.