How many variables can be pairwise anticorrelated

1k Views Asked by At

I am working on a computational project involving analysis of data. Each item of data that I have has a few hundred attributes; I have several million items of data. The attributes are essentially random variables and I am computing the correlation of each pair of variables. The formula to do that is:

$$ \rho_{xy} = \frac{\mathbb{E}( (X-\mu_{x})(Y-\mu_{y}) )}{\sigma_{x}\sigma_{y}}$$

where $X$ and $Y$ are the attributes. This value always lies between $+1$ and $-1$. Intuitively, if $\rho_{xy}\approx +1$ then $X$ and $Y$ typically differ from their respective means in the same way, i.e. if $X$ is above average then so is $Y$ and if $X$ is below average, then so is $Y$. This is called positive correlation. Similarly, if $\rho_{xy}\approx -1$ then $X$ and $Y$ typically differ from their means in opposite directions, i.e. if $X$ is above average, then $Y$ is below and vice versa. This is called negative correlation.

From all of my data, I build a correlation matrix $C=[\rho_{ij}]$ which contains the correlation coefficient of the $i^{\text{th}}$ and $j^{\text{th}}$ variable in the $ij$-entry. I want to view this matrix as an edge-weighted graph and perform clustering. A "positive" clique would correspond to a number of pairwise positively correlated variables. It is clear that many variables can be pairwise positively correlated; so I could potentially see large "positive" cliques in the graph.

My question is: How many variables could I see that are pairwise negatively correlated? Intuition tells me probably only 2. But I cannot prove this. Basically, I want to define "negative" cliques--a set of nodes all of whose edges are weighted with (nearly) $-1$, and I want to know how large my "negative" cliques could be.

Edit: Perhaps a better way to ask the question is: to what degree can "anti-correlation" be transitive? I.e. if $x$ is (strongly) anti-correlated to $y$, and $y$ is (strongly) anti-correlated to $z$, how strongly can $x$ and $z$ be anti-correlated?

Also, if there is a way to say how large this "negative" clique could be as a function of how near the edge weights are to $-1$ that would be helpful. The idea that there largest negative clique has 2 vertices is working on the assumption that I only have an edge when the weight is exactly $-1$. If I relax that to an edge when the weight is smaller than $-1+\epsilon$ (small $\epsilon$) then I can probably have slightly more than 2 vertices--how many more is the question (and how does that related to $\epsilon$).