Estimator with algorithm about graphs

111 Views Asked by At

Consider an undirected graph $G=(V,E)$ representing the social network of friendship/trust between students. We would like to form teams of three students that know each other. The question is to decide whether the network allows for enough such teams, without checking all the triples of graph $G$. For this reason, we use random sampling to design an efficient estimator of the number of connected triples. We partition the set of node triples into four sets $T_0,T_1,T_2$, and $T_3$. A node triple $v1,v2,v3$ belongs to:

  • $T_0$ iff no edge exists between the nodes $v1,v2$, and $v3$,
  • $T_1$ iff exactly one of the edges $(v1,v2)$, $(v2,v3)$, and $(v3,v1)$ exists,
  • $T_2$ iff exactly two of the edges $(v1,v2)$, $(v2,v3)$, and $(v3,v1)$ exist,
  • $T_3$ iff all of the edges $(v1,v2)$, $(v2,v3)$, and $(v3,v1)$ exist.

$|T_3|$ denotes the number of connected triples in the graph that is the quantity we need to estimate. Consider the following algorithm:

• Sample an edge $e=(a,b)$ uniformly chosen from $E$

• Choose a node $v$ uniformly from $V \setminus(a,b)$

• if $(a,v)∈E$ and $(b,v)∈E$ then $x=1$, else $x=0$

Let $x_1, x_2, . . . x_s$ be the outcomes of $s$ independent executions of the algorithm. Show that

$\frac{1}{s}$ $\displaystyle\sum_{i=1}^s x_i \frac{|E|(|V|-2)}{3}$ is an estimator of $|T_3|$.

I have to do this exercise in a university course of algorithms. The problem is I have no skills about statistics and the course doesn't give anything about this arguments, and I don't know neither how to start to prove this. Could someone give me help?

1

There are 1 best solutions below

6
On BEST ANSWER

The exercise is wrong in asking you to show that something is an estimator. There's nothing to show there, since you can use whatever quantity you want as an estimator for $|T_3|$. It's not part of the definition of an estimator that it bears any particular relationship to the estimated quantity. There are specific properties of an estimator, such as being unbiased, efficient or sufficient, which can be proven, but you can’t “show” that something is an estimator of $|T_3|$.

A reasonable guess at what they mean is that this is an unbiased estimator of $|T_3|$, i.e. that the expected value of this estimator is the true value of $|T_3|$.

To show that this is the case, first consider the simpler procedure of uniformly randomly selecting any of the $\binom{|V|}3$ potential triangles between any $3$ vertices. In this case, we'd expect to select a triangle of three existing edges with probability

$$ \frac{|T_3|}{\binom{|V|}3}\;, $$

so

$$ \frac1s\sum_{i=1}^sx_i\binom{|V|}3 $$

would be an unbiased estimator of $|T_3|$. Note that this procedure is equivalent to first uniformly randomly selecting a pair of vertices (i.e. a potential edge) and then uniformly randomly selecting a third, different vertex. That suggests a more efficient approach: Since the contribution to the estimator from the sum will be $0$ if we select a pair of vertices that aren’t connected by an edge, we can instead uniformly randomly select an existing edge. The result is the procedure specified in the exercise. We're now ignoring the zero contributions from vertex pairs not connected by an edge; we're still uniformly sampling the remaining contributions from vertex pairs connected by an edge, but by a factor

$$ \frac{\binom{|V|}2}{|E|} $$

more often than before. Thus the expected value of $x_i$, i.e. the probability of sampling a triangle, is now

$$ \frac{|T_3|}{\binom{|V|}3}\cdot\frac{\binom{|V|}2}{|E|}=\frac3{|E|(|V-2|)}\cdot|T_3|\;, $$

and so

$$ \frac1s\sum_{i=1}^sx_i\frac{|E|(|V-2|)}3 $$

has expected value $|T_3|$ and is thus an unbiased estimator of $|T_3|$.