How to calculate Pearson's correlation coefficient for this small graph?

1.5k Views Asked by At

Let $A$ denote a weighted, directed network, where the weights in the rows are the out-degrees, and weights in the columns are the in-degrees:

$A$ = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 2 \\ 1 & 0 & 0 \end{bmatrix}

enter image description here

Here is the formula that I would like to use (the notation isn't clear to me):

$$r= \frac{\sum_{jk}(jk(e(j,k)-q(j)q(k)))}{\sigma_{q}^2}$$

According to my R function (igraph), $r = - 0.5$ .

My problem is that I don't understand what the variables denote in this formula. I read the book Networks by Newman, and it provided the same 'minimalist' information about the equation.

I tried to calculate the correlation by links or by the sum of the in- and out-degrees of the nodes, however, my results were different. Let $x$ be the vector of out-edges, and $y$ the vector of in-edges: $x=(1,1,0), y=(0,0,2)$, then $corr(x,y)=-1$.

Could someone show me a step-by-step solution, please?

1

There are 1 best solutions below

2
On BEST ANSWER

The Newman formula reported in the OP was originally conceived for undirected graphs (as correctly stated in the comments). A key concept in this formula is that of the "remaining degree" of a node, that is to say the number of edges connecting the node, after excluding the one we arrived along. The formula can be generalized to the case of a directed graph by substituting the denominator $\sigma_q^2$ (i.e., the variance of the remaining degree) with $\sigma_{in}^2 \sigma_{out}^2\,$, i.e. the product of the SDs of the remaining in- and out-degrees of the nodes at the head and tail of the edge, respectively. This generalization leads to the following formula to calculate assortativity in directed graphs:

$$r=\displaystyle \frac{\sum_{jk} \,jk \left(e_{jk}-q_jq_k \right)}{\sigma_i^2 \sigma_o^2 }$$

where $e_{jk}$ is the joint probability distribution of the remaining degrees of the two nodes at either end of each edge (asymmetric in the case of directed graph), and where $q_j$ and $q_k$ refer to the probability distribution of the remaining degree at each of the two nodes, considerated separately. Since $jk \, e_{jk}$ and $jk \, q_jq_k$ correspond to $E(jk)$ and $E(j)E(k)\,\,$, respectively, the formula strictly resembles that of the standard Pearson correlation coefficient for a sample including two variables $x,y\,$:

$${\displaystyle r_{xy}={\frac {E( x_{i}y_{i})-{\bar {x}}{\bar {y}}}{s_{x}s_{y}}}}$$

where $s_x$ and $s_y$ are the corresponding sample SDs. This explains why assortativity can be considered equal to the Pearson correlation coefficient of the degrees at the ends of an edge.

Now it should be pointed out that the standard R application for assortativity (igraph) was not initially developed for unweighted graphs, so by default it calculates assortativity ignoring weights. Therefore, considering the graph shown in the OP and defining $RD(a,b)\,$ the remaining degrees of the two nodes of an edge (where $a$ and $b$ quantify the remaining out-degrees of the node at the tail of the edge and the remaining in-degrees of the node at the head of the edge, respectively), the R application considers edge $1$ as $RD(1,1)\,\,$ (it connects a node with one remaining out-degree to another node with one remaining in-degree). By similar considerations, the R application considers edge $2$ as $RD(1,0)\,\,$ and edge $3$ as $RD(0,1)\,\,$. This finally leads to the vectors $x(1,1,0)\,\,$ and $y=(1,0,1)\,\,$, whose correlation is $-\frac{1}{2}\,$.

Lastly, note that there exist R applications that are able to calculate assortativity for weighted networks. These applications replace degree count with degree strength. A nice discussion about this topic can be found here.