I would like to calculate the expected maximum tree size in a randomly generated pseudoforest of $N$ labelled nodes where self-loops are not permitted. Empty and single-node trees are also not permitted.
For example, if we have $4$ labelled nodes, we can generate $3$ pseudoforests with a largest tree size of $2$, and $78$ pseudoforests with a maximum tree size of $4$.
There are a total of $(n-1)^n$ possible pseudoforests, thus for $N = 4$ there are $81$.
The expected maximum tree size for $N = 4$ would therefore be: $$ E(x) = \sum_{i=1}^{n}i\cdot p(i) = 2 \cdot \frac{3}{81} + 4\cdot\frac{78}{81} = 3.925... $$
Some observations:
- There will never be a pseudoforest where the maximum tree size is $n-1$.
The number of pseudoforests of $N$ nodes containing only one connected tree (therefore maximum tree size of $N$) can be calculated using sequences $A000435$ or $A001864 / n$.
For $N = 4$, this gives us $78$, ie. when $i = n$ in the summation.
- The minimum tree size is $2$ if $N$ is even, and $3$ if $N$ is odd.
- The sum of the numerators of $p(i)$ is equal to $(n-1)^n$
When $N = 5$, the summation is: $$ 3 \cdot \frac{80}{1024} + 5\cdot\frac{944}{1024} = 4.84375 $$
When $N = 6$, the summation is:
$$
2 \cdot \frac{15}{15625} + 3\cdot\frac{640}{15625} + 4\cdot\frac{1170}{15625} + 6\cdot\frac{13800}{15625} = 5.72352
$$
How can I calculate the numerators of $p(i)$ when $i < n$?
We observe that all fractions in $E(n)$ have the same denominator, so let's directly calculate the total numerator $S(n)$; for instance $S(5) = 3 \cdot 80 + 5 \cdot 944$.
Let $T(n)$ be the number of pseudoforests with exactly one connected component, or the number of "good graphs" in the notation of this question.
Let $\pi(n)$ be the set of the integer partitions of $n$.
The numerator can then be found with the following algorithm:
$\texttt{result} \leftarrow 0$
$\texttt{foreach}\;\tau\in\pi(n):$
$\quad\quad\texttt{if}\;1\in\tau:$
$\quad\quad\quad\quad \texttt{continue}$
$\quad\quad\texttt{else}:$
$\quad\quad\quad\quad\texttt{result +=} \max(\tau) \cdot C(\tau) \cdot \prod_{t\in \tau}{T(t)}$,
where $C(\tau)$ is the number of ways $n$ labelled items can be split according to $\tau$.
Let $t_1,\dots,t_r$ be the elements of $\tau$, let $s_1,\dots,s_s$ be the elements of $\tau$ without repeats and let $m_1,\dots,m_s$ be their respective multiplicities. Then we have:
$$C(\tau) = \frac{{n \choose t_1} \cdot {n - t_1 \choose t2} \cdot \dots \cdot {t_r \choose t_r}}{\prod_{i = 1}^{s}{m_i!}}\text{.}$$
Essentially, $C(\tau)$ counts the number of ways of splitting $n$ labelled nodes into connected components of the sizes given by $\tau$, while $\prod{T(n)}$ counts the number of ways of connecting the labelled nodes inside each connected component.