What is the normalized graph matrix if the row-sum of proximity matrix is zero?

382 Views Asked by At

Let $S \in \mathbb{R}_{\ge 0}^{n \times n}$ be the proximity (or similarity) matrix of a graph, e.g. $$ S = \left[ \begin{matrix} 0 & 0.9 & 0.3 \\ 0.9 & 0 & 0.4 \\ 0.3 & 0.4 & 0 \end{matrix}\right] $$ Here, the elements on the diag should be $0$s.

Now, we define $D$ be a diagonal matrix with the row-sum of $S$ on the diagonal entries. In my running example, $$ D = \left[ \begin{matrix} 1.2 & 0 & 0 \\ 0 & 1.3 & 0 \\ 0 & 0 & 0.7 \end{matrix}\right] $$

Then, we normalize $S$ with $D$, i.e. $$ \begin{align} \bar{S} &= D^{-\frac{1}{2}} S D^{-\frac{1}{2}} \\ &= \left[ \begin{matrix} \frac{1}{\sqrt{1.2}} & 0 & 0 \\ 0 & \frac{1}{\sqrt{1.3}} & 0 \\ 0 & 0 & \frac{1}{\sqrt{0.7}} \end{matrix}\right] \left[ \begin{matrix} 0 & 0.9 & 0.3 \\ 0.9 & 0 & 0.4 \\ 0.3 & 0.4 & 0 \end{matrix}\right] \left[ \begin{matrix} \frac{1}{\sqrt{1.2}} & 0 & 0 \\ 0 & \frac{1}{\sqrt{1.3}} & 0 \\ 0 & 0 & \frac{1}{\sqrt{0.7}} \end{matrix}\right] \\ \end{align} $$ (omit subsequent steps...)

My question is: How can I handle when the row-sum of $S$ is zero?

Specifically, if $$ S = \left[ \begin{matrix} 0 & 0 & 0.3 \\ 0 & 0 & 0 \\ 0.3 & 0 & 0 \end{matrix}\right] $$ i.e. no edge is connected to vertex 2. Thus, $D$ will be $$ D = \left[ \begin{matrix} 0.3 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0.3 \end{matrix}\right] $$ As a result, I don't know how to compute $D^{-\frac{1}{2}}$. So, what is the correct calculation method for $D^{-\frac{1}{2}}$ and $\bar{S}$ hereafter at this time?

1

There are 1 best solutions below

1
On

(This is just an expansion on the points in the comments above, so that the question has a proper answer.)

Given $n$ points $x_1, \ldots, x_n$ in a metric space $(X, d)$ (e.g. $X = \Bbb{R}^m$ with the Euclidean distance function), the proximity matrix $A$ is defined by $A_{ij} = d(x_i, x_j)$. Naturally, the symmetry of metrics imply that $A$ must be symmetric, and the positive-definiteness of the metric implies that the entries of $A$ are non-negative, with $0$s down the diagonal.

However, not every symmetric matrix with non-negative entries and $0$s down the diagonal will be a proximity matrix. Indeed, the example provided: $$S = \begin{pmatrix} 0 & 0 & 0.3 \\ 0 & 0 & 0 \\ 0.3 & 0 & 0 \end{pmatrix}$$ cannot be a proximity matrix, as we have $$d(x_1, x_3) = 0.3 > 0 + 0 = d(x_1, x_2) + d(x_2, x_3),$$ violating triangle inequality.

Indeed, if the $j$th column of the matrix sums to $0$, then we have $d(x_i, x_j) = 0$ for all $i$, as each entry is non-negative. By the definiteness of the metric, this implies $x_i = x_j$ for all $i$. That is, all the points are the same! In such cases, every entry in the proximity matrix must be $0$.

Can this ever happen? Sure, if all the points are the same. If not, then all the column sums are strictly positive, and computation of $D^{-\frac{1}{2}}$ is straight forward.

Is this allowed to happen? Maybe; it depends on the circumstances. Certainly such a case is pretty degenerate, and I could foresee many applications of proximity matrices rejecting such a case, or treating it specially.

Ultimately, it's up to the individual to decide what their proximity matrix is modelling, and what every point being identical says about the situation they're modelling.

In this case, you indicated that this was modelling protein interactions, with $0$ columns appearing when data was missing. It's difficult to tell without a much wider perspective, but it would appear that filling in missing data with zeros breaks the proximity matrix, for the reasons argued above (it makes the matrix not a valid proximity matrix). If you don't have data on a given protein, then try performing your analysis with one fewer protein.