Sampling from a graph

59 Views Asked by At

Suppose you have a graph $G=(V,E)$ that is unobservable globally and you wish to take a sample from the vertices of that graph to infer something about its global properties from local properties. Maybe each vertex also has some attribute, say, $a:V\to\mathbb{R}$, and it is possible that if $v_1,v_2$ are neighbours then $a(v_1)$ and $a(v_2)$ are correlated. Suppose when we sample a vertex $v$, we can observe $a(v)$ and the neighbourhood of $v$ (but not any of the attributes of the neighbourhood of $v$; only the vertex id's). To give this more concreteness, maybe $G$ is the internet (or some subset thereof), with pages as vertices which are linked when they are, well, linked. Maybe $a(v)$ is the number of times that $v$ has been viewed.

I am interested in knowing the best way to sample $G$ to make inferences about its properties—say, average degree, diameter, whatever—as well as to make inferences about the distribution of $a$. One might decide to take a random sample of the vertices, but there are problems with this. Imagine $G$ is a star on 1000 vertices. Sampling 20 of them, it is likely that we miss the centre. Our picture of $G$ would then be as a set of 20 isolated vertices, which would be all wrong. We might decide instead to "spider" along the graph, but now we are no longer taking a random sample and perhaps introducing bias into the sample. I imagine that one can split the difference with some kind of hybrid random sample, where we follow along links maybe $k$ at a time and then jump to a random vertex.

My question is whether there is literature on this or ideas related to this. It is related to some research that I am doing, and I can't really find anything in the literature.

1

There are 1 best solutions below

0
On

I've been quite looking around for literature on the topic and here's what I've found:

  1. A Survey and Taxonomy of Graph Sampling: This one focuses on techniques used in unweighted graphs, mainly on what measures and methods have been used to determine samples with a given property.
  2. Counting and sampling paths in graphs: This one focuses on Monte Carlo Methods, trees and complete graphs.

I hope these are useful!