Cardinality of sets in a random graph generated from block model

83 Views Asked by At

Let $n\in \mathbb{N}$ be given. Let us assume that the set of vertices is $V=[n]=\mathcal{C}^+ \cup \mathcal{C}^- \cup \mathcal{D}$, where the sets $\mathcal{C}^+$ and $\mathcal{C}^-$ stand for the nodes in the graph which can have either only out-going and only in-going links respectively or no links and the set $\mathcal{D}$ describes the nodes in the network which can have both, in- and out-going links. Next, let

enter image description here

be the matrix containing all the possible probabilities of having an edge between two vertices. From this matrix $B$ we will generate a graph. The produced actual graph then allows us to redefine the sets of node-types to the following sets: \begin{align*} &\mathrm{C}^+ =\{i \in V | k_i>0, j_i =0 \},\\ &\mathrm{C}^- =\{i \in V | k_i=0, j_i >0 \},\\ &\mathrm{D} =\{i \in V | k_i>0, j_i >0 \},\\ &\mathrm{X} =\{i \in V | k_i=0, j_i =0 \}. \end{align*} ($k_i$ is the out-degree, $j_i$ stands for the in-degree)

Note that the sets from before do not have to coincide with the redefined sets, as we also have for example the possibility, that an initial node in $\mathcal{D}$ ends up having only out-going links, so it would be counted as an element of $\mathrm{C}^+$. Or since it also possible that a node does not have any link to or from another node, we get the set $\mathrm{X}$ of unconnected nodes.

I am interested in the expected number of cardinalities of these latter sets. Are the following calculations correct or is there an easier way to get the expected values?

$$\mathbb{E}( | \mathrm{C}^+ | ) = p^{cd} \cdot | \mathcal{C}^+ | \cdot | \mathcal{D} | + p^{cc} \cdot | \mathcal{C}^- | + p^{dd} \cdot | \mathcal{D} | \cdot (| \mathcal{D} |-1) \cdot (1-p^{cd}) \cdot | \mathcal{C}^+ | \cdot (1-p^{dd}) \cdot (| \mathcal{D}|-1) + | \mathcal{D}| \cdot | \mathcal{C}^-| \cdot p^{cd} \cdot (1-p^{cd}) \cdot | \mathcal{C}^+|$$ (the first term comes from the nodes that are in $\mathcal{C}^+$ and indeed get out-going links. The second term contributes from the nodes in the set $\mathcal{D}$ which will only end up having out-going links to either other nodes in $\mathcal{D}$ or to nodes in $\mathcal{C}^-$.)

Analogoulsy the next calculations:

$$\mathbb{E}( | \mathrm{C}^- | ) = p^{cc} \cdot | \mathcal{C}^- | \cdot | \mathcal{C}^+ | \cdot p^{dd} \cdot (| \mathcal{D} |-1) \cdot | \mathcal{D} | \cdot (1-p^{dd}) \cdot ( | \mathcal{D}|-1) \cdot (1-p^{cd}) \cdot | \mathcal{C}^-| $$ $$\mathbb{E}( | \mathrm{D} | ) = p^{dd} \cdot | \mathcal{C}^+ | \cdot p^{cd} \cdot (| \mathcal{D} |-1) + p^{dd} \cdot | \mathcal{C}^- | \cdot p^{cd} \cdot (| \mathcal{D} |-1) + ( p^{dd})^2 \cdot (| \mathcal{D} |-1)^2 $$ \begin{align*} \mathbb{E}( | \mathrm{X} | ) &= N - \mathbb{E}( | \mathrm{C}^+ | + | \mathrm{C}^- | + | \mathrm{D} |) \\ &= 2 \cdot \big(| \mathcal{C}^+ | \cdot (1-p^{cd}) \cdot | \mathcal{D}| \cdot (1-p^{cc}) \cdot | \mathcal{C}^- | \big) + | \mathcal{D}| \cdot (| \mathcal{D} |-1)^2 \cdot (1-p^{dd})^2 \cdot (1-p^{cd})^2 \cdot | \mathcal{C}^+ | \cdot | \mathcal{C}^- | \end{align*}

1

There are 1 best solutions below

4
On BEST ANSWER

You shouldn't be multiplying $|\mathcal C^+|$ and $|\mathcal D|$ anywhere in these formulas. (This is incorrect for all $p^{cc}, p^{cd}, p^{dd}$ but it's easy to notice it failing when $p^{cc} = p^{cd} = p^{dd} = 1$, for example.) What you're computing ends up being more like the number of edges of a given type.

Instead: each vertex of $\mathcal C^+$ ends up in $X$ with probability $(1-p^{cc})^{|\mathcal C^-|}(1-p^{cd})^{|\mathcal D|}$, and in $C^+$ otherwise. This gives us $$ (1 - (1-p^{cc})^{|\mathcal C^-|}(1-p^{cd})^{|\mathcal D|})|\mathcal C^+| $$ as one of the terms contributing to $\mathbb E[|C^+|]$.

The other terms are computed analogously (vertices of $\mathcal D$ are the other ones that can contribute to $\mathbb E[|C^+|]$), though vertices of $\mathcal D$ have more complicated expressions because the possibilities are more varied.

In general, if you are confused about first moment calculations, you should fall back on the strategy of splitting up the random variable as a sum of indicator random variables. In this case, it means that for every $\mathcal A \in \{\mathcal C^+, \mathcal C^-, \mathcal D\}$ and for every $A \in \{C^+, C^-, D, X\}$, you need to find the probability that a vertex $v \in \mathcal A$ will end up in $A$, and then you just add up these probabilities across all vertices.