Expected number of distinct nodes visited in a directed bipartite graph

183 Views Asked by At

Let $G = (V,E)$ be a directed bipartite graph with $V = \{I \cup O\}$ where $\left\vert{I}\right\vert = n$ and $\left\vert{O}\right\vert = m$. All the edges start from a vertex in $I$ and end on a vertex in $O$. Let $d_{I_i}$ represent the outdegree of the $i^{th}$ vertex in $I$ and $d_{O_j}$ represent the indegree of the $j^{th}$ vertex in $O$.

Vertices in $I$ are allowed to be adjacent to the same vertices in $O$. We sample for $k$ vertices from the set $I$ uniformly at random with replacement. If a vertex from $I$ is selected in the sampled set then we mark all the nodes in $O$ adjacent to it. Initially all the nodes in $O$ are unmarked.

Thus, in $k$ queries what is the expected number of distinct nodes marked in $O$ by such a sampling mechanism.

I have an expression for the case where $d_{I_i}$'s are same i.e. $d_{I_i} = d_I$ and all $d_{O_j}$'s are same (regular graph). Expected number of distinct vertices marked in $O$ for the regular case is $m(1 - (1 - d_I/m)^k)$.

In the general case where degrees are not same, can I replace the $d_I$ by the average degree of the graph in the previous equation? Does it vary with the variance of the degree distribution? Any leads will be appreciated. Thank you.

1

There are 1 best solutions below

12
On BEST ANSWER

Getting the expected number of distinct vertices seen in $O$ is not too much harder than the case that you have computed. Now that $$ \# \text{ of vertices marked in }O = \sum_{v_j \in O} 1_{v_j \text{ is marked}}. $$ By linearity of expectation, you have that $$ E[\# \text{ of vertices marked in }O] = \sum_{v_j \in O} P(v_j \text{ is marked}). $$ On the event that $v_j$ is marked, on at least one of the $k$ queries, one of its $d_{O_j}$ neighbors must have been chosen in $I$. So $$ P(v_j \text{ is marked}) = \left( 1 - \left( 1 - \frac{d_{O_j}}{n} \right)^k \right), $$ and you get that $$ E[\# \text{ of vertices marked in }O] = \sum_{v_j \in O} \left( 1 - \left( 1 - \frac{d_{O_j}}{n} \right)^k \right). $$ This simplifies to your formula in the case where $n=m$ and $d_{O_j}=d_{I_j}=d$ for all $j$.

EDIT: I missed the last part of the question. Now suppose the bipartite directed graph is chosen randomly (according to some probability distribution). Now, just as before $$ P(v_j \text{ is marked}) = 1 - P(v_j \text{ is unmarked after }k) = 1 - P(v_j \text{ is unmarked after }1)^k. $$ Note that $v_j$ is unmarked if none of its neighbors is chosen. We can break up this probability depending upon the outcome of $v_j$'s in-degree. $$ P(v_j \text{ is marked after }1) = \sum_{d=0}^{n} P(\text{in-deg}(v_j)=d) \left( \frac{n-d}{n}\right). $$ This uses an assumption that we chose the vertex from $I$ uniformly at random independently of the potential edges of the directed graph. We can rewrite the above as \begin{align*} P(v_j \text{ is unmarked after }1) &= \sum_{d=0}^{n} P(\text{in-deg}(v_j)=d) -\sum_{d=0}^{n} P(\text{in-deg}(v_j)=d) \left( \frac{d}{n}\right). \\ &= 1 - \frac{E[\text{in-deg}(v_j)]}{n}. \end{align*} Therefore $$ E[\# \text{ of vertices marked in }O] = \sum_{v_j \in O} \left( 1 - \left( 1 - \frac{E[\text{in-deg}(v_j)]}{n} \right)^k \right). $$