Biased estimator of number of edges in a graph

138 Views Asked by At

I meet this problem when doing my scientific research.

Let $G$ be a simple graph with $n$ vertices, i.e. $G$ is undirected and does not have any loops or multiple edges. We randomly sample some vertices using the following rule. Each vertex is sampled with the same probability $\alpha$ independently. Let $G_1$ be the subgraph induced by the sampled vertices, i.e. an edge of $G$ is also in $G_1$ if and only if both endpoints of this edge are sampled. Suppose we observes that $G_1$ has $n_1$ vertices and $m_1$ edges. Our goal to estimate the number of edges $\hat{m}$ in $G$ using $n,n_1,m_1$ but without $\alpha$.

Since

$$ \hat{m}=\left(\frac{1}{\alpha}\right)^2 m_1 $$

is an unbiased estimator and

$$ \alpha=\mathbb{E}\left[\frac{n_1}{n}\right], $$

we replace $\alpha$ in this estimator with $\frac{n_1}{n}$ and get the following estimator

$$\hat{m}=\left(\frac{n}{n_1}\right)^2 m_1.$$

However, this is a biased estimator, while the correct unbiased estimator is

$$\hat{m}=\frac{n(n-1)}{n_1(n_1-1)} m_1.$$

This seems strange. Can you give me an explanation? What is the difference between these two estimators?

1

There are 1 best solutions below

0
On BEST ANSWER

There are $n(n-1)/2$ edges in the fully connected graph, some there in your graph (let's color them red), some not there (color blue). Randomly drawing edges is not different from randomly drawing unordered pairs of vertices. If you could do independent draws from all unordered pairs of vertices, that would be ideal to estimate $m$.

When you draw $n_1$ vertices independently, you also draw pairs of vertices, but it is not the same as when drawing from the set of pairs. Some draws of pairs are precluded. For example, you could look at a set of pairs like $\{12, 23\}$, but if you draw vertices $\{1,2,3\}$, the pair $13$ is also there. So you can only draw certain sets of unordered pairs. Another way of looking at things is to say that the unordered-pair-draw events are not independent.

However, not all is lost. By symmetry, any unordered pair is equally likely to be part of the pairs obtained from $n_1$ vertices, for any $n_1 > 1$. The probability that a particular unordered pair is drawn by picking $n_1$ vertices must be $$ \frac{n_1(n_1 - 1)/2}{n(n-1)/2}. $$

Let's say that there are $m$ red edges in the graph with $n(n-1)/2$ total edges. Let's further assume that these $m$ edges are chosen randomly when the structure of the graph is determined. Then the expected number of red edges appearing in the sub-graph with $n_1$ nodes is $$ \mathbb{E}[m_1|n_1] = m\frac{n_1(n_1 - 1)/2}{n(n-1)/2}, $$ where $m_1$ is a r.v. that gives the observed number of red edges drawn. So we can propose an estimator for $m$ $$ \hat{m} = m_1\frac{n(n-1)}{n_1(n_1-1)}. $$ Now, if we check for bias, we find that $$ \mathbb{E}[\hat{m}] = \sum_{n_1} \mathbb{P}(n_1)\left( \sum_{m_1} \mathbb{P}(m_1|n_1) m_1\frac{n(n-1)/2}{n_1(n_1-1)/2}\right) = \sum_{n_1} \mathbb{P}(n_1) \frac{n(n-1)}{n_1(n_1-1)} \mathbb{E}[m_1|n_1]= m. $$