Measure of subgraph isomorphisms

86 Views Asked by At

I have a "big" (biological) finite graph $\Gamma = (V, E)$ and a set of subsets of vertices $S = \{\gamma(t)\}_{t \in \mathcal T}$ (with a given function $\gamma : \mathcal T \to \mathcal P(V)$ where $|V| \simeq 18,000$ and $|\mathcal T| \simeq 12,500$). I am interested in the different subgraphs induced by the $\gamma(t)$ modulo isomorphisms. I claim that in my case, the number of different non-isomorphic subgraphs is clearly higher than expected if they were uniformly distributed among $V$. To do this, I sample uniformly selected subsets of $V$ (of the same size as the $\gamma(t)$ subsets) and look at the number of non-isomorphic subgraphs I end up with, and then repeat this step a lot of times to have an idea of the distribution of the number of non-isomorphic subgraphs under the uniform sampling.

However I need to have a statistical measure on this. So what I did was approximating the probability distribution mentioned right above and use Chebyshev's inequality to infer an upper bound that is used as a $p$-value. This gives me what I expect (i.e. a very low $p$-value).

My first question is: is this method valid in any way? and my second question is: how can I measure the "density" of isomorphisms?

What I mean by "density of isomorphisms" is a scalar value (hopefully normalized so between 0 and 1) that will represent somehow how much of isomorphisms can be found within my subgraphs. The only thing I have been able to come up with so far is based on Shannon entropy (yeah computer science background here):

$$H(S) = \frac 1{\log |S|}\sum_{s \in S/\simeq}\frac {|s|}{|S|}\log\frac {|s|}{|S|},$$

where $S/\simeq$ is the set of isomorphism classes and $\frac 1{\log |S|}$ is there to normalize the value. But again I have no idea if this can get me somewhere in the statistical analysis of my results.