How to calculate the expected maximum tree size in a pseudoforest

993 Views Asked by At

I would like to calculate the expected maximum tree size in a randomly generated pseudoforest of $N$ labelled nodes where self-loops are not permitted. Empty and single-node trees are also not permitted.

For example, if we have $4$ labelled nodes, we can generate $3$ pseudoforests with a largest tree size of $2$, and $78$ pseudoforests with a maximum tree size of $4$.

There are a total of $(n-1)^n$ possible pseudoforests, thus for $N = 4$ there are $81$.

The expected maximum tree size for $N = 4$ would therefore be: $$ E(x) = \sum_{i=1}^{n}i\cdot p(i) = 2 \cdot \frac{3}{81} + 4\cdot\frac{78}{81} = 3.925... $$

Some observations:

  1. There will never be a pseudoforest where the maximum tree size is $n-1$.
  2. The number of pseudoforests of $N$ nodes containing only one connected tree (therefore maximum tree size of $N$) can be calculated using sequences $A000435$ or $A001864 / n$.

    For $N = 4$, this gives us $78$, ie. when $i = n$ in the summation.

  3. The minimum tree size is $2$ if $N$ is even, and $3$ if $N$ is odd.
  4. The sum of the numerators of $p(i)$ is equal to $(n-1)^n$

When $N = 5$, the summation is: $$ 3 \cdot \frac{80}{1024} + 5\cdot\frac{944}{1024} = 4.84375 $$

When $N = 6$, the summation is: $$ 2 \cdot \frac{15}{15625} + 3\cdot\frac{640}{15625} + 4\cdot\frac{1170}{15625} + 6\cdot\frac{13800}{15625} = 5.72352 $$
How can I calculate the numerators of $p(i)$ when $i < n$?

3

There are 3 best solutions below

4
On BEST ANSWER

We observe that all fractions in $E(n)$ have the same denominator, so let's directly calculate the total numerator $S(n)$; for instance $S(5) = 3 \cdot 80 + 5 \cdot 944$.

Let $T(n)$ be the number of pseudoforests with exactly one connected component, or the number of "good graphs" in the notation of this question.

Let $\pi(n)$ be the set of the integer partitions of $n$.

The numerator can then be found with the following algorithm:

$\texttt{result} \leftarrow 0$
$\texttt{foreach}\;\tau\in\pi(n):$
$\quad\quad\texttt{if}\;1\in\tau:$
$\quad\quad\quad\quad \texttt{continue}$
$\quad\quad\texttt{else}:$
$\quad\quad\quad\quad\texttt{result +=} \max(\tau) \cdot C(\tau) \cdot \prod_{t\in \tau}{T(t)}$,

where $C(\tau)$ is the number of ways $n$ labelled items can be split according to $\tau$.

Let $t_1,\dots,t_r$ be the elements of $\tau$, let $s_1,\dots,s_s$ be the elements of $\tau$ without repeats and let $m_1,\dots,m_s$ be their respective multiplicities. Then we have:

$$C(\tau) = \frac{{n \choose t_1} \cdot {n - t_1 \choose t2} \cdot \dots \cdot {t_r \choose t_r}}{\prod_{i = 1}^{s}{m_i!}}\text{.}$$

Essentially, $C(\tau)$ counts the number of ways of splitting $n$ labelled nodes into connected components of the sizes given by $\tau$, while $\prod{T(n)}$ counts the number of ways of connecting the labelled nodes inside each connected component.

3
On

I am sending some preliminary, as yet not fine-tuned and optimized obervations to generate activity on this question. I am sure these admit improvements (the time and space complexity of the algorithm is poor).

If I have decoded the problem statement correctly we are enumerating labeled endofunctions with no fixed points as appeared at this MSE link so that these pseudo-forests are directed.

The species $\mathcal{Q}$ under consideration is $$\mathcal{Q} = \mathfrak{P}(\mathfrak{C}_{=2}(\mathcal{T}) + \mathfrak{C}_{=3}(\mathcal{T}) + \mathfrak{C}_{=4}(\mathcal{T}) + \cdots).$$ where $\mathcal{T}$ represents labeled rooted trees with EGF $T(z)$ and functional equation $T(z) = z\exp T(z),$ the labeled tree function.

It follows that the distribution of the maxima of the tree sizes (where unicyclic components are counted by the number of nodes) on $n$ nodes can be found by considering the species $$\mathcal{Q}_{\le n} = \mathfrak{P}(\mathfrak{C}_{=2}(\mathcal{T}_{\le n}) + \mathfrak{C}_{=3}(\mathcal{T}_{\le n}) + \mathfrak{C}_{=4}(\mathcal{T}_{\le n}) + \cdots + \mathfrak{C}_{=n}(\mathcal{T}_{\le n}))$$ and marking the cycles with a variable $\mathcal{V}_q$ indicating the number of nodes in the component using the generating function $$Q(z) = \exp \left(\frac{T_{\le n}(z)^2}{2} + \frac{T_{\le n}(z)^3}{3} + \frac{T_{\le n}(z)^4}{4} + \cdots + \frac{T_{\le n}(z)^n}{n}\right).$$

This will produce the following distributions: $${u}^{2},\\ 8\,{u}^{3},\\ 78\,{u}^{4}+3\,{u}^{2},\\ 944\,{u}^{5}+80\,{u}^{3},\\ 13800\,{u}^{6}+1170\,{u}^{4}+640\,{u}^{3}+15\,{u}^{2},\\ 237432\,{u}^{7}+19824\,{u}^{5}+21840\,{u}^{4}+840\,{u}^{3},\\ 4708144\,{u}^{8}+386400\,{u}^{6}+422912\,{u}^{5}+229320\,{u}^{4} \\+17920\,{u}^{3}+105\,{u}^{2},\\ 105822432\,{u}^{9}+8547552\,{u}^{7}+9273600\,{u}^{6}+9634464\,{u}^{5}\\ +786240\,{u}^{4}+153440\,{u}^{3},\ldots$$

This gives the following for the expected maximum tree size: $$2., 3., 3.925925926, 4.843750000, 5.723520000, 6.612311385, \\ 7.471092584, 8.342072010, 9.189007167, 10.04727275, \\ 10.88589292, 11.73525388, 12.56739638, 13.40959924,\ldots$$ which would seem to indicate that most of these consist of one connected component.

The following Maple code was used to compute these values. It took $58$ seconds to compute the distribution for $n=35,$ which represents $$399725722782532944388077717044552088857010024925364224$$ pseudoforests ($34^{35}$).

gf_le :=
proc(n)
    option remember;
    local Tle, term, res, p;

    Tle := add(q^(q-1)*z^q/q!, q=1..n);

    res := 0;

    for term in expand(add(Tle^q/q, q=2..n)) do
        p := degree(term, z);

        res := res + v[p]*coeff(term, z, p)*z^p;
    od;

    exp(res);
end;

gf :=
proc(n)
    option remember;
    local res, cft, vs, term;

    res := 0; cft := n!*coeftayl(gf_le(n), z=0, n);

    if not type(cft, `+`) then
        cft := [cft];
    fi;

    for term in cft do
        vs := indets(term);
        res := res +
        lcoeff(term)*u^max(map(t->op(1, t), vs));
    od;

    res;
end;
0
On

I implemented @Jacopo Notarstefano's method, which is a good answer and has better space complexity than my initial draft. Time complexity is middling. It can compute the distribution for $n=60$ in $291$ seconds or about five minutes according to my tests.

Here are some of the coefficients for $n=60$: $$2: 29215606371473169285018060091249259296875\\ 3: 6771490865749881440878442829560908615652159...\\ 180618395077599800000000\\ 4: 1005563324091428408820112074277646586848640...\\ 7341577965559630663675312982983593750\\ 10: 312487203633572495389224364764544078483826...\\ 833508949993777620300122108361941948220628320222797082880\\ 12: 198506674917987164954385699985381605466521...\\ 98813653741864939041891599818516344945718291826495396755200$$

This is the Maple code.

with(combinat);

conn :=
proc(n)
    option remember;

    n*(n-1)^n + n!*
    add((-1)^(k+1)/k*
        add(binomial(k,q)*(-1)^(k-q)*
            add((n-q)^(n-p)/(n-p)!*
                binomial(p+q-2, q-2), p=0..n),
            q=2..k), k=2..n);

end;

pf_dist :=
proc(n)
    option remember;
    local res, lambda, cf, cfc, scm, q;

    res := 0;

    lambda := firstpart(n);
    while type(lambda, `list`) do
        cf := n!/mul(q!, q in lambda);
        cfc := mul(conn(q), q in lambda);
        scm := mul(q[2]!, q in convert(lambda, `multiset`));

        res := res + cf*cfc*u^max(lambda)/scm;

        lambda := nextpart(lambda);
    od;

    res;
end;