Ensure a graph approximates an Erdős-Renyi random graph even as nodes are added

386 Views Asked by At

Suppose we have a graph $G$ where the number of nodes increases over time, e.g. whenever the mean number of edges per node exceeds some value (which may be a function of the number of nodes). What is an appropriate policy to select edges to add such that graph is as close as possible to an Erdős–Rényi graph?

This isn't exactly in my wheelhouse, as such I have only a vague sense of what "as close as possible" denotes in the formal sense. I suppose the best alternate phrasing would be:

Suppose at time $t$ the $G$ has $n$ nodes and $e$ edges. The policy of selecting edges to add over time should maximize the difficulty of distinguishing $G$ from another graph with the same number of nodes and edges but where the number of nodes was fixed from time $0$ and the edges were added uniformly and at random.

Even if you cannot provide an explicit solution, if there are search term(s) I should use I could use for additional information, that would be much appreciated!

Edit: The policy should be a function over pairs of nodes with some additional arguments (the precise nature of these arguments I'm not sure) and return a probability that an edge should be added between them. Let the policy be a function $f$, i.e.:

$$f(n_i, n_j, e_i, e_j) = P(\text{add edge between }n_i \text{and }n_j)$$

where $n_i, n_j$ are two nodes and $e_i, e_j$ are the number of edges of $n_i$ and $n_j$ respectively. You may assume no edge exists between $n_i$ and $n_j$. Note that the function is by no means restricted to these arguments, however given the size of the graph means that it is difficult to compute global statistics about the graph (I think, at least) each time it is used, thus it's best to restrict it to arguments that involve properties of the two nodes in question.

2

There are 2 best solutions below

3
On BEST ANSWER

This post is also more to help you formalize the problem you want to tackle, so it contains references for all the issues you raise. Here 'close' is also meant in an intuitive sense, how to formalize this is in the last paragraph.

Let's write $\mathcal{G}(n,p(n))$ for the random ER graph, where you obtain a graph $G$ by taking each edge independently with probability $p(n)$. First, let's assume that you are not interested in the case where $p(n)=c$ for some constant, because you can just flip a coin when you add a vertex with every other vertex, and you get exactly $\mathcal{G}(n,p(n))$. So we can assume $p(n)=o(1)$.

With that assumption, the main property of $\mathcal{G}(n,p(n))$ is that all pairs of vertices have the same probability to be linked, and this is also true for subsets of vertices. So the probability that vertices from time $1$ to $t_0$ must have the same probability to be adjacent that vertices from time $t_0$ to $t_1$. So if you want to stay close to $\mathcal{G}(n,p(n))$, since $p(n)=o(1)$ you need to look for early structures with a too high edge-density and remove edges. If you have access to labels, when you add vertex $t$, you remove edges in $[1, \dots, t-1]$ with probability $p(t)-p(t-1)$, then add edges between $t$ and other vertices with probability $p(t)$. Otherwise you are down to computing edge-densities for all subsets and removing some of the less probable excess. In any case, if you want to be close to $\mathcal{G}(n,p(n))$, it will be computationally expensive, and you won't be able to do it with local algorithms.

If you are interested in what happens when you just add edges with time, this is called the uniformly grown random graphs (studied here: http://arxiv.org/pdf/cond-mat/0104546). This type of random graph is fundamentally different from $\mathcal{G}(n,p(n))$. You can use other (more complex) methods to add vertices and edges over time, like in the framework of inhomogeneous random graphs (see http://arxiv.org/abs/math/0504589); in each case, these graphs are very different from $\mathcal{G}(n,p(n))$.

What does 'close to ER' mean? There are various metrics that you can use, such as the cut metric (see http://arxiv.org/abs/1009.2376); however if you are interested in the asymptotic behaviour you need to use some other metric because for the cut metric every graph sequence with $o(n^2)$ edges converges to the empty graph. There is another metric that works well if you have $\Theta(n)$ edges, which consists basically in counting the occurrences of each local neighbourhoods of vertices. In other regimes, things are a bit more difficult. You can look at the works of Bollobás et al. (e.g. http://arxiv.org/abs/0708.1919) for some references on graph metrics.

1
On

Not an answer: Just trying to clarify the question & reformulate somewhat.

It seems we need to first define a semantics on how a graph $G$ changes over time. Let $n$ and $e$ be the number of nodes and edges in $G$ at a given time. Also let's assume there exists some function $f : \mathbb{N} \rightarrow \mathbb{N}$. Let's say, then, at every discrete time "step" one of two things happens:

  1. If $e < f(n)$, we select an edge based on the edge selection policy for $G$- the policy being a probability distribution over $G$.

  2. If $e = f(n)$, we add a node.

Note that $f(n)$ is slightly different than that stated in the question. In the question we talk about average edges per node, here we talk about total edges, which is easier to formulate. There is no difference really, as you just divide $e$ by $n$, or equivalently multiply $f(n)$ by $n$ in the calculations above.

In any case, we have a sequence of natural numbers $f(1), f(2), f(3), \dots$ determining the cutoff number of edges for adding a new node to $G$. It would make sense for this sequence to be monotonic, or at least non-decreasing. After this I'm afraid I don't have a solution, other than the intuition I provided in my comment.


Note we are looking for a policy, which is a function from a graph $G$ to a probability distribution over its edges.