Expected values of adjacency matrix elements of double edge swap randomized graphs

275 Views Asked by At

Problem

Suppose I have a simple graph $G$ with adjacency matrix $A_{ij}$ and there are no self-loops. Consider the distribution of graphs obtained by randomizing $G$ through double edge swap randomization (where the constraints of no self-loops and no multi-edges are enforced throughout the randomization). Is it possible to obtain analitically:

  • $\langle A_{ij} \rangle$ ?
  • $\langle A_{ij}A_{ik} \rangle$ ?

I couldn't find relevant results in literature on this topic. The closest I could find is the famous (Park & Newman, 2013), but it doesn't really answer my question.

My semi-intuitive approach

By equating the rewiring to a generative process, the probability that a given node $i$ with degree $D_i$ will NOT be connected to a node $j$ with degree $D_j$ will be given by $$ P(\sim A_{ij}) = (1 - \frac{D_j}{M})(1 - \frac{D_j}{M-D_{i_1}})(1 - \frac{D_j}{M-D_{i_2}})\cdots (1 - \frac{D_j}{M-D_{D_{i-1}}})$$ and thus the probability of them being connected is $$ P(A_{ij}) = 1 - P(\sim A_{ij})$$ My intuitive reasoning is that if we imagine that we are randomly connecting the stubs of node $i$ when no other node is connected yet, we have a probability $D_j/M$ to connect it to node $j$. The probability of this NOT happening is $1-D_j/M$, and this means that another node with degree $D_{i_1}$ has been connected. Since we don't allow multi-edges, node $i_1$ and all its stubs are removed from the pool of possible connections for node $i$. Thus, at the second step the probability of not connecting to node $j$ becomes $$1 - \frac{D_j}{M - D_{i_1}}.$$ By repeating the same process for all the stubs of node $i$, we obtain the form above. In this way I thought I was enumerating all the possible situations where node $i$ is not connected to node $j$, and by calculating $ P(A_{ij}) = 1 - P(\sim A_{ij})$ I thought I would get the result.

The problem here is that the formula above depends on the particular sequence of degrees $\{D_{i_u}\}$ extracted in the generative model, so I should average over all the possible (allowed) ordered sequences. I tested several approaches to approximate this quantity, but I found numerically that the best fitting is by just replacing each $D_{i_u}$ with the graph's average node degree $\langle D \rangle$.

With this approach I obtain a predicted value that is close to the numerically calculated value, but I am not sure if this is the right approach and there are several shaky definitions I made. Moreover, I cannot obtain analitically several observations I made numerically, so I suspect there are some errors.

Is this approach right and/or is there a better approach?

1

There are 1 best solutions below

3
On

This is easy if we

  1. Assume that the starting degree sequence is relatively sparse: all degrees are much smaller than the number of vertices.
  2. Don't require the graph to remain simple when we randomize, but instead just hope that it will remain simple.

Hoping that the graph will remain simple is not too awful. With the sparsity assumption, we expect that there will not be too many parallel edge or loops. (In many cases, such as if the starting degree sequence is $d$-regular for constant $d$, there is also a constant probability that the graph will be simple, as the number of vertices goes to infinity.) Also, we do not expect that the multigraph's behavior will be very different from simple graphs.

In this case, the limiting distribution will exist and be uniform over all graphs with the given degree sequence, and we can sample from it as follows:

  • For each node $i$ with degree $d_i$, take $d_i$ stubs (half-edges) out of that node.
  • Randomly pair up all the $2m$ stubs in the graph, where $m$ is the number of edges.

Now, let's compute $\langle A_{ij} \rangle$ and $\langle A_{ij} A_{ik} \rangle$ (for the multigraph, interpreting $A_{ij}$ as the number of edges between $i$ and $j$).

  1. There are $d_i d_j$ pairs of stubs on nodes $i$ and $j$. For each, the probability is $\frac1{2m-1}$ that they are paired together, so by linearity of expectation, $\langle A_{ij}\rangle = \frac{d_i d_j}{2m-1}$.
  2. There are $d_i (d_i-1) d_j d_k$ ways to choose two (ordered) stubs on node $i$, one stub on node $j$, and one stub on node $k$. For each choice, the probability is $\frac1{(2m-1)(2m-3)}$ that the first stub on $i$ is paired with the stub on $j$, and the second stub on $i$ is paired with the stub on $k$. By linearity of expectation, $\langle A_{ij} A_{ik}\rangle = \frac{d_i(d_i-1)d_jd_k}{(2m-1)(2m-3)}$.

$A_{ij}$ and $A_{ij}A_{ik}$ are not quite independent from the event that the graph is simple: every time we connect a stub on a node $i$ to a stub on a different node $j$, that's an edge which is not a loop. But (again, assuming that all degrees are small), it should be close to independent, and so the expected values should not change much if we condition on the graph being simple.

(To see that the assumption that all degrees are small is necessary, consider a degree sequence $n-1, 1, 1, \dots, 1$; then there is actually only one realization which is a simple graph, and $\langle A_{1j}\rangle = 1$ for all $j \ne 1$ even though $\langle A_{1j} \rangle = \frac{n-1}{2n-1} \approx \frac12$ by the formula above.)