How good do two datasets agree?

49 Views Asked by At

Short story:

I have two vectors (random variables) $A$ and $B$ and want to compute how much they agree. Each entry is either 1 or 0. I would like to compare how often they agree to how often we would expect them to agree. These two numbers are easy to compute.

$x = P[A=B], y = P[A =1]\cdot P[B=1] + P[A = 0]\cdot P[B=0]$ where $P[]$ denotes the probability.

However, given $x$ and $y$, how much better than chance do the two variables agree? Is an increase from $y=0.5$ to $x = 0.6$ a similarly good improvement than from $y=0.89$ to $x = 0.99$?

Long story: The two variables $A$ and $B$ are predictions about some future event, e.g. will the stock market rise tomorrow? I want to understand whether these two predictions $A$ and $B$ predict the same. Now if they're the same at least 50% that's no big achievement, because we would expect that by chance anyway. So I have the impression that correlation is no good measure for the agreement (resp. it overstates it, as a correlation of 50% should actually be interpreted as a correlation of 0%, because it's no better than pure chance).

A usual alternative is the $R^2$, which compares correlation to internal variation of one variable. But I don't want to designate one of the two as the more important one...