Exponential Generating Function of rooted minimal directed acyclic graphs

Question

Exponential Generating Function of rooted minimal directed acyclic graphs

897 Views Asked by Bumbble Comm At 30 Mar 2026 - 8:55

I am trying to find the exponential generating function of directed minimal acyclic graphs (which I now call dag), where every non-leaf node has two outgoing edges.

Context: A simple tree compression algorithm consists of saving repeated subtrees only once. Further occurrences of repeated trees are simply linked to the first occurrence. This way one gets a unique minimal directed acyclic graph, and since we started with a tree it's also rooted. For simplicity I would like to treat binary trees, hence two outgoing edges per non-leaf node.
A natural question is how big dags of binary trees of size $n$ are, and the question has been answered here (the paper is Analytic Variations on the Common Subexpression Problem, by Flajolet et al).

I would like to ask a different question, namely how many different dags of size $n$ there are, or equivalently, how many rooted plane binary trees have a dag of size $n$?

As an example, for $n=3$, we have three such trees, namely $a(a(a,a),a(a,a))$, $a(a(a,a),a)$ and $a(a,a(a,a))$.
For $n=4$ there are $15$ trees, for $n=5$ there are $111$.
A promising sequence from OEIS is A001063, but I can neither make sense of the differential equation mentioned there, nor do I have a combinatorial explanation for the formula there that calculates $a_{n+1}$, given $a_1,\dots,a_n$: $$ a_{n+1} = \sum_{k=0..n} \frac{n!}{k!} \cdot \binom{n-1}{k-1}\cdot a_k $$

If requested, I could add where I got stuck (I mostly tried to make sense of the formula), but I think my post is already too long.

Addendum. This question has generated its own OEIS-series (A254789)! Thanks to everyone involved!

Original Q&A

There are 7 best solutions below

Bumbble Comm On 05 Feb 2015 - 2:58

I would like to present a script to compute this statistic in order to generate activity on this question and motivate research into fast algorithms the goal being to compute enough terms for an OEIS entry. The linked paper shows that this will probably not be all that easy.

The below Perl script can be used to compute this statistic up to $a_7.$ It outputs the distributions of the number of distinct subtrees of a binary tree on $k$ nodes with $k\le 2^n$ when $a_n$ is being computed.

Here´is the output for $a_6,$ which confirms the values presented in the comments.

1: u (1)
2: 2 u^2 (2)
3: u^2 + 4 u^3 (5)
4: 6 u^3 + 8 u^4 (14)
5: 4 u^3 + 22 u^4 + 16 u^5 (42)
6: 32 u^4 + 68 u^5 + 32 u^6 (132)
7: u^3 + 20 u^4 + 152 u^5 + 192 u^6 (365)
8: 10 u^4 + 196 u^5 + 584 u^6 (790)
9: 12 u^4 + 158 u^5 + 1140 u^6 (1310)
10: 160 u^5 + 1436 u^6 (1596)
11: 6 u^4 + 96 u^5 + 1692 u^6 (1794)
12: 68 u^5 + 1568 u^6 (1636)
13: 88 u^5 + 1284 u^6 (1372)
14: 24 u^5 + 1256 u^6 (1280)
15: u^4 + 36 u^5 + 1112 u^6 (1149)
16: 6 u^5 + 760 u^6 (766)
17: 24 u^5 + 854 u^6 (878)
18: 408 u^6 (408)
19: 18 u^5 + 504 u^6 (522)
20: 308 u^6 (308)
21: 416 u^6 (416)
22: 48 u^6 (48)
23: 8 u^5 + 246 u^6 (254)
24: 92 u^6 (92)
25: 160 u^6 (160)
26: 32 u^6 (32)
27: 144 u^6 (144)
28: 0 (0)
29: 72 u^6 (72)
30: 0 (0)
31: u^5 + 52 u^6 (53)
32: 6 u^6 (6)
33: 12 u^6 (12)
34: 0 (0)
35: 42 u^6 (42)
36: 0 (0)
37: 0 (0)
38: 0 (0)
39: 24 u^6 (24)
40: 0 (0)
41: 0 (0)
42: 0 (0)
43: 0 (0)
44: 0 (0)
45: 0 (0)
46: 0 (0)
47: 10 u^6 (10)
48: 0 (0)
49: 0 (0)
50: 0 (0)
51: 0 (0)
52: 0 (0)
53: 0 (0)
54: 0 (0)
55: 0 (0)
56: 0 (0)
57: 0 (0)
58: 0 (0)
59: 0 (0)
60: 0 (0)
61: 0 (0)
62: 0 (0)
63: u^6 (1)
64: 0 (0)
-
u + 3 u^2 + 15 u^3 + 111 u^4 + 1119 u^5 + 14487 u^6

The script is actually highly compact and may perhaps benefit from being re-written in C. This is the code:

#! /usr/bin/perl -w
#

sub gf2str {
    my ($gf) = @_;

    return "0" if scalar(keys(%$gf)) == 0;

    my @terms;
    foreach my $exp (sort { $a <=> $b } keys %$gf){
        my $contr = $gf->{$exp};

        if($contr == 1 && $exp == 1){
            push @terms, "u";
        }
        elsif($contr == 1){
            push @terms, "u^$exp";
        }
        elsif($exp == 1){
            push @terms, 
            sprintf "%d u", $contr;
        }
        else{
            push @terms,
            sprintf "%d u^%d", $contr, $exp;
        }
    }

    join(' + ', @terms);
}


MAIN: {
    my $mx = shift || 1;

    my %grand;


    my $memo = [];
    push @{ $memo->[0]->[0] }, {};

    for(my $n=1; $n <= 2**$mx; $n++){
        for(my $m=0; $m<=$n-1; $m++){
            for(my $dst1 = 0; $dst1 < $mx; $dst1++){
                for(my $dst2 = 0; $dst2 < $mx; $dst2++){
                    if(exists($memo->[$m]->[$dst1]) &&
                       exists($memo->[$n-1-$m]->[$dst2])){
                        my $t1 = $memo->[$m]->[$dst1];
                        my $t2 = $memo->[$n-1-$m]->[$dst2];

                        for my $ta (@$t1){
                            for my $tb (@$t2){
                                my $tree = {};

                                @$tree{ keys %$ta } = 
                                    (1) x scalar(keys %$ta);
                                @$tree{ keys %$tb } = 
                                    (1) x scalar(keys %$tb);

                                $tree->{$tree} = 1;

                                my $count = scalar(keys %$tree);
                                if($count <= $mx){
                                    push @{ $memo->[$n]->[$count] },
                                    $tree;
                                }
                            }
                        }
                    }
                }
            }
        }

        my %gf = (); my $total = 0;
        for(my $dst = 0; $dst <= $mx; $dst++){
            if(exists($memo->[$n]->[$dst])){
                my $val = scalar(@{ $memo->[$n]->[$dst] });
                $gf{$dst} = $val;

                $total += $val;
                $grand{$dst} += $val;
            }
        }


        print "$n: ";
        print gf2str(\%gf);
        print " ($total)\n";
    }

    print "-\n";

    print gf2str(\%grand);
    print "\n";
}

Bumbble Comm On 07 Feb 2015 - 4:46

I would like to present a C program that can calculate the value for $n=7$ very quickly and is computing the value for $n=8$ as of now. Once that computation has completed (if indeed it completes in a reasonable time) we could start thinking about submitting to the OEIS. The reader is invited to compute the distributions for $n=8$ with this program if a fast machine is available (the memory footprint is very reasonable) or to propose a better effort, perhaps using a counting technique instead of direct enumeration.

These are the distributions of the distinct subtree sizes when $n=7$ with the grand totals being shown at the end:

1: u (1)
2: 2 u^2 (2)
3: u^2 + 4 u^3 (5)
4: 6 u^3 + 8 u^4 (14)
5: 4 u^3 + 22 u^4 + 16 u^5 (42)
6: 32 u^4 + 68 u^5 + 32 u^6 (132)
7: u^3 + 20 u^4 + 152 u^5 + 192 u^6 + 64 u^7 (429)
8: 10 u^4 + 196 u^5 + 584 u^6 + 512 u^7 (1302)
9: 12 u^4 + 158 u^5 + 1140 u^6 + 1984 u^7 (3294)
10: 160 u^5 + 1436 u^6 + 5216 u^7 (6812)
11: 6 u^4 + 96 u^5 + 1692 u^6 + 9120 u^7 (10914)
12: 68 u^5 + 1568 u^6 + 13656 u^7 (15292)
13: 88 u^5 + 1284 u^6 + 16608 u^7 (17980)
14: 24 u^5 + 1256 u^6 + 17240 u^7 (18520)
15: u^4 + 36 u^5 + 1112 u^6 + 17552 u^7 (18701)
16: 6 u^5 + 760 u^6 + 17672 u^7 (18438)
17: 24 u^5 + 854 u^6 + 15664 u^7 (16542)
18: 408 u^6 + 15380 u^7 (15788)
19: 18 u^5 + 504 u^6 + 13964 u^7 (14486)
20: 308 u^6 + 11032 u^7 (11340)
21: 416 u^6 + 11564 u^7 (11980)
22: 48 u^6 + 8992 u^7 (9040)
23: 8 u^5 + 246 u^6 + 8984 u^7 (9238)
24: 92 u^6 + 4824 u^7 (4916)
25: 160 u^6 + 7176 u^7 (7336)
26: 32 u^6 + 4328 u^7 (4360)
27: 144 u^6 + 4972 u^7 (5116)
28: 2632 u^7 (2632)
29: 72 u^6 + 4368 u^7 (4440)
30: 1440 u^7 (1440)
31: u^5 + 52 u^6 + 3128 u^7 (3181)
32: 6 u^6 + 1408 u^7 (1414)
33: 12 u^6 + 2118 u^7 (2130)
34: 688 u^7 (688)
35: 42 u^6 + 1972 u^7 (2014)
36: 420 u^7 (420)
37: 1096 u^7 (1096)
38: 272 u^7 (272)
39: 24 u^6 + 1102 u^7 (1126)
40: 192 u^7 (192)
41: 912 u^7 (912)
42: 48 u^7 (48)
43: 696 u^7 (696)
44: 144 u^7 (144)
45: 96 u^7 (96)
46: 0 (0)
47: 10 u^6 + 460 u^7 (470)
48: 68 u^7 (68)
49: 244 u^7 (244)
50: 24 u^7 (24)
51: 252 u^7 (252)
52: 0 (0)
53: 104 u^7 (104)
54: 0 (0)
55: 200 u^7 (200)
56: 0 (0)
57: 0 (0)
58: 0 (0)
59: 144 u^7 (144)
60: 0 (0)
61: 0 (0)
62: 0 (0)
63: u^6 + 68 u^7 (69)
64: 6 u^7 (6)
65: 12 u^7 (12)
66: 0 (0)
67: 18 u^7 (18)
68: 0 (0)
69: 0 (0)
70: 0 (0)
71: 64 u^7 (64)
72: 0 (0)
73: 0 (0)
74: 0 (0)
75: 0 (0)
76: 0 (0)
77: 0 (0)
78: 0 (0)
79: 30 u^7 (30)
80: 0 (0)
81: 0 (0)
82: 0 (0)
83: 0 (0)
84: 0 (0)
85: 0 (0)
86: 0 (0)
87: 0 (0)
88: 0 (0)
89: 0 (0)
90: 0 (0)
91: 0 (0)
92: 0 (0)
93: 0 (0)
94: 0 (0)
95: 12 u^7 (12)
96: 0 (0)
97: 0 (0)
98: 0 (0)
99: 0 (0)
100: 0 (0)
101: 0 (0)
102: 0 (0)
103: 0 (0)
104: 0 (0)
105: 0 (0)
106: 0 (0)
107: 0 (0)
108: 0 (0)
109: 0 (0)
110: 0 (0)
111: 0 (0)
112: 0 (0)
113: 0 (0)
114: 0 (0)
115: 0 (0)
116: 0 (0)
117: 0 (0)
118: 0 (0)
119: 0 (0)
120: 0 (0)
121: 0 (0)
122: 0 (0)
123: 0 (0)
124: 0 (0)
125: 0 (0)
126: 0 (0)
127: u^7 (1)
128: 0 (0)
-
u + 3 u^2 + 15 u^3 + 111 u^4 + 1119 u^5 + 14487 u^6 + 230943 u^7

This is the C code that I used which is quite simple (a port of the Perl with some of the defects remedied). A considerable portion is actually print formatting for readable output.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <time.h>
#include <unistd.h>

#define MXDST 12

typedef struct th {
  struct th *distsub[MXDST];
  int size;
} tree_inst, *tree_ptr;

tree_ptr tree_new(tree_ptr left, tree_ptr right, int mx)
{
  tree_ptr mergebuf[2*mx];
  int pos, idx;

  memcpy(mergebuf, left->distsub,
         left->size*sizeof(tree_ptr));

  pos = left->size;
  for(idx = 0; idx < right->size; idx++){
    tree_ptr item = right->distsub[idx];

    int cmp;
    for(cmp=0; cmp < left->size; cmp++){
      if(mergebuf[cmp] == item) break;
    }

    if(cmp == left->size){
      mergebuf[pos] = item;
      pos++;
    }
  }

  if(pos >= mx) return NULL;

  tree_ptr item = malloc(sizeof(tree_inst));
  assert(item != NULL);

  memcpy(item->distsub, mergebuf,
         pos*sizeof(tree_ptr));

  item->distsub[pos] = item;
  item->size = pos+1;

  return item;
}

typedef struct {
  tree_ptr *data;
  int alloc;
  int count;
} coll_inst, *coll_ptr;

#define COLL_CHUNK 512

coll_ptr coll_new(void)
{
  coll_ptr item = malloc(sizeof(coll_inst));
  assert(item != NULL);

  item->data = malloc(COLL_CHUNK*sizeof(tree_ptr));
  assert(item->data != NULL);

  item->alloc = COLL_CHUNK;
  item->count = 0;

  return item;
}

coll_ptr coll_record(coll_ptr coll, tree_ptr entry)
{
  if(coll->count == coll->alloc){
    coll->alloc += COLL_CHUNK;

    coll->data = realloc(coll->data, 
                        coll->alloc*sizeof(tree_ptr));
    assert(coll->data != NULL);
  }

  coll->data[coll->count++] = entry;
  return coll;
}

int main(int argc, char **argv)
{
  int mx = 1;

  if(argc>1){
    int mxcmd = atoi(argv[1]);

    if(1 <= mxcmd && mxcmd <= MXDST){
      mx = mxcmd;
    }
    else{
      fprintf(stderr, "invalid maxdist value, "
              "got %d\n", mxcmd);
      exit(-1);
    }
  }

  int nmx = 1 << mx;
  coll_ptr table[nmx+1][mx+1];

  int n, dst;
  for(n=0; n <= nmx; n++){
    for(dst = 0; dst <= mx; dst++){
      table[n][dst] = coll_new();
    }
  }

  int grand[mx+1];
  for(dst = 0; dst <= mx; dst++){
    grand[dst] = 0;
  }


  tree_inst base;
  base.distsub[0] = NULL;
  base.size = 0;
  coll_record(table[0][0], &base);

  for(n=1; n <= nmx; n++){
    int m; time_t time_begin, time_end;

    time_begin = time(NULL);
    for(m=0; m <= n-1; m++){
      int dst1, dst2;

      for(dst1 = 0; dst1 < mx; dst1++){
        for(dst2 = 0; dst2 < mx; dst2++){
          coll_ptr 
            ca = table[m][dst1],
            cb = table[n-1-m][dst2];

          int c1, c2;
          for(c1 = 0; c1 < ca->count; c1++){
            for(c2 = 0; c2 < cb->count; c2++){
              tree_ptr 
                t1 = ca->data[c1],
                t2 = cb->data[c2];

              tree_ptr tree = tree_new(t1, t2, mx);
              if(tree != NULL)
                coll_record(table[n][tree->size],
                            tree);
            }
          }
        }
      }
    }
    time_end = time(NULL);

    printf("%d: ", n);

    int total = 0, ents = 0;
    for(dst = 1; dst <= mx; dst++){
      int cval = table[n][dst]->count;

      if(cval > 0){
        if(ents > 0) printf(" + ");

        if(cval > 1) printf("%d ", cval);
        printf("u");
        if(dst > 1) printf("^%d", dst);

        total += cval; ents++;
        grand[dst] += cval;
      }
    }

    if(ents == 0) printf("0");
    printf(" (%d)\n", total);

    int secs = (int)difftime(time_end, time_begin);
    if(!isatty(fileno(stdout)))
      fprintf(stderr, "%d [%ds]\n", n, secs);
  }

  printf("-\n");
  for(dst = 1; dst <= mx; dst++){
    if(dst > 1) printf(" + ");

    if(grand[dst] > 1) printf("%d ", grand[dst]);
    printf("u");
    if(dst > 1) printf("^%d", dst);
  }

  printf("\n");

  exit(0);
}

Addendum Sat Feb 7 22:47:53 CET 2015. This is the distribution for $n=8$ with the grand totals at the end.

1: u (1)
2: 2 u^2 (2)
3: u^2 + 4 u^3 (5)
4: 6 u^3 + 8 u^4 (14)
5: 4 u^3 + 22 u^4 + 16 u^5 (42)
6: 32 u^4 + 68 u^5 + 32 u^6 (132)
7: u^3 + 20 u^4 + 152 u^5 + 192 u^6 + 64 u^7 (429)
8: 10 u^4 + 196 u^5 + 584 u^6 + 512 u^7 + 128 u^8 (1430)
9: 12 u^4 + 158 u^5 + 1140 u^6 + 1984 u^7 + 1312 u^8 (4606)
10: 160 u^5 + 1436 u^6 + 5216 u^7 + 6208 u^8 (13020)
11: 6 u^4 + 96 u^5 + 1692 u^6 + 9120 u^7 + 20608 u^8 (31522)
12: 68 u^5 + 1568 u^6 + 13656 u^7 + 46544 u^8 (61836)
13: 88 u^5 + 1284 u^6 + 16608 u^7 + 87224 u^8 (105204)
14: 24 u^5 + 1256 u^6 + 17240 u^7 + 134416 u^8 (152936)
15: u^4 + 36 u^5 + 1112 u^6 + 17552 u^7 + 173568 u^8 (192269)
16: 6 u^5 + 760 u^6 + 17672 u^7 + 205152 u^8 (223590)
17: 24 u^5 + 854 u^6 + 15664 u^7 + 228920 u^8 (245462)
18: 408 u^6 + 15380 u^7 + 237984 u^8 (253772)
19: 18 u^5 + 504 u^6 + 13964 u^7 + 243800 u^8 (258286)
20: 308 u^6 + 11032 u^7 + 246216 u^8 (257556)
21: 416 u^6 + 11564 u^7 + 231864 u^8 (243844)
22: 48 u^6 + 8992 u^7 + 225688 u^8 (234728)
23: 8 u^5 + 246 u^6 + 8984 u^7 + 220836 u^8 (230074)
24: 92 u^6 + 4824 u^7 + 205232 u^8 (210148)
25: 160 u^6 + 7176 u^7 + 186484 u^8 (193820)
26: 32 u^6 + 4328 u^7 + 159216 u^8 (163576)
27: 144 u^6 + 4972 u^7 + 169008 u^8 (174124)
28: 2632 u^7 + 131296 u^8 (133928)
29: 72 u^6 + 4368 u^7 + 139496 u^8 (143936)
30: 1440 u^7 + 104000 u^8 (105440)
31: u^5 + 52 u^6 + 3128 u^7 + 113864 u^8 (117045)
32: 6 u^6 + 1408 u^7 + 81688 u^8 (83102)
33: 12 u^6 + 2118 u^7 + 98484 u^8 (100614)
34: 688 u^7 + 56724 u^8 (57412)
35: 42 u^6 + 1972 u^7 + 82520 u^8 (84534)
36: 420 u^7 + 47312 u^8 (47732)
37: 1096 u^7 + 65964 u^8 (67060)
38: 272 u^7 + 37552 u^8 (37824)
39: 24 u^6 + 1102 u^7 + 50436 u^8 (51562)
40: 192 u^7 + 25164 u^8 (25356)
41: 912 u^7 + 44856 u^8 (45768)
42: 48 u^7 + 19440 u^8 (19488)
43: 696 u^7 + 36956 u^8 (37652)
44: 144 u^7 + 19064 u^8 (19208)
45: 96 u^7 + 29552 u^8 (29648)
46: 7472 u^8 (7472)
47: 10 u^6 + 460 u^7 + 25252 u^8 (25722)
48: 68 u^7 + 8368 u^8 (8436)
49: 244 u^7 + 17100 u^8 (17344)
50: 24 u^7 + 7904 u^8 (7928)
51: 252 u^7 + 16792 u^8 (17044)
52: 4328 u^8 (4328)
53: 104 u^7 + 13808 u^8 (13912)
54: 3936 u^8 (3936)
55: 200 u^7 + 11452 u^8 (11652)
56: 3496 u^8 (3496)
57: 6896 u^8 (6896)
58: 800 u^8 (800)
59: 144 u^7 + 10312 u^8 (10456)
60: 1440 u^8 (1440)
61: 4192 u^8 (4192)
62: 1808 u^8 (1808)
63: u^6 + 68 u^7 + 5800 u^8 (5869)
64: 6 u^7 + 1360 u^8 (1366)
65: 12 u^7 + 4206 u^8 (4218)
66: 304 u^8 (304)
67: 18 u^7 + 3608 u^8 (3626)
68: 420 u^8 (420)
69: 2168 u^8 (2168)
70: 0 (0)
71: 64 u^7 + 3838 u^8 (3902)
72: 472 u^8 (472)
73: 1120 u^8 (1120)
74: 440 u^8 (440)
75: 1880 u^8 (1880)
76: 48 u^8 (48)
77: 912 u^8 (912)
78: 0 (0)
79: 30 u^7 + 1904 u^8 (1934)
80: 328 u^8 (328)
81: 456 u^8 (456)
82: 48 u^8 (48)
83: 1764 u^8 (1764)
84: 72 u^8 (72)
85: 96 u^8 (96)
86: 0 (0)
87: 976 u^8 (976)
88: 0 (0)
89: 504 u^8 (504)
90: 0 (0)
91: 144 u^8 (144)
92: 0 (0)
93: 0 (0)
94: 0 (0)
95: 12 u^7 + 738 u^8 (750)
96: 80 u^8 (80)
97: 160 u^8 (160)
98: 24 u^8 (24)
99: 432 u^8 (432)
100: 0 (0)
101: 72 u^8 (72)
102: 0 (0)
103: 344 u^8 (344)
104: 0 (0)
105: 0 (0)
106: 0 (0)
107: 224 u^8 (224)
108: 0 (0)
109: 0 (0)
110: 0 (0)
111: 256 u^8 (256)
112: 0 (0)
113: 0 (0)
114: 0 (0)
115: 0 (0)
116: 0 (0)
117: 0 (0)
118: 0 (0)
119: 240 u^8 (240)
120: 0 (0)
121: 0 (0)
122: 0 (0)
123: 0 (0)
124: 0 (0)
125: 0 (0)
126: 0 (0)
127: u^7 + 84 u^8 (85)
128: 6 u^8 (6)
129: 12 u^8 (12)
130: 0 (0)
131: 18 u^8 (18)
132: 0 (0)
133: 0 (0)
134: 0 (0)
135: 24 u^8 (24)
136: 0 (0)
137: 0 (0)
138: 0 (0)
139: 0 (0)
140: 0 (0)
141: 0 (0)
142: 0 (0)
143: 90 u^8 (90)
144: 0 (0)
145: 0 (0)
146: 0 (0)
147: 0 (0)
148: 0 (0)
149: 0 (0)
150: 0 (0)
151: 0 (0)
152: 0 (0)
153: 0 (0)
154: 0 (0)
155: 0 (0)
156: 0 (0)
157: 0 (0)
158: 0 (0)
159: 36 u^8 (36)
160: 0 (0)
161: 0 (0)
162: 0 (0)
163: 0 (0)
164: 0 (0)
165: 0 (0)
166: 0 (0)
167: 0 (0)
168: 0 (0)
169: 0 (0)
170: 0 (0)
171: 0 (0)
172: 0 (0)
173: 0 (0)
174: 0 (0)
175: 0 (0)
176: 0 (0)
177: 0 (0)
178: 0 (0)
179: 0 (0)
180: 0 (0)
181: 0 (0)
182: 0 (0)
183: 0 (0)
184: 0 (0)
185: 0 (0)
186: 0 (0)
187: 0 (0)
188: 0 (0)
189: 0 (0)
190: 0 (0)
191: 14 u^8 (14)
192: 0 (0)
193: 0 (0)
194: 0 (0)
195: 0 (0)
196: 0 (0)
197: 0 (0)
198: 0 (0)
199: 0 (0)
200: 0 (0)
201: 0 (0)
202: 0 (0)
203: 0 (0)
204: 0 (0)
205: 0 (0)
206: 0 (0)
207: 0 (0)
208: 0 (0)
209: 0 (0)
210: 0 (0)
211: 0 (0)
212: 0 (0)
213: 0 (0)
214: 0 (0)
215: 0 (0)
216: 0 (0)
217: 0 (0)
218: 0 (0)
219: 0 (0)
220: 0 (0)
221: 0 (0)
222: 0 (0)
223: 0 (0)
224: 0 (0)
225: 0 (0)
226: 0 (0)
227: 0 (0)
228: 0 (0)
229: 0 (0)
230: 0 (0)
231: 0 (0)
232: 0 (0)
233: 0 (0)
234: 0 (0)
235: 0 (0)
236: 0 (0)
237: 0 (0)
238: 0 (0)
239: 0 (0)
240: 0 (0)
241: 0 (0)
242: 0 (0)
243: 0 (0)
244: 0 (0)
245: 0 (0)
246: 0 (0)
247: 0 (0)
248: 0 (0)
249: 0 (0)
250: 0 (0)
251: 0 (0)
252: 0 (0)
253: 0 (0)
254: 0 (0)
255: u^8 (1)
256: 0 (0)
-
u + 3 u^2 + 15 u^3 + 111 u^4 + 1119 u^5 + 14487 u^6 + 230943 u^7 + 4395855 u^8

Bumbble Comm On 09 Feb 2015 - 12:49

The following C program uses POSIX threads to create a parallel implementation of the single-threaded C program I posted earlier. It can be used to compute the distributions for $n=8.$ This had peak memory allocation $2.1$GB and took $23$ minutes on a machine with $24$ processors at $2$GHz. It would appear that the distributions for $n=9$ could perhaps be computed on a machine with at least $32$ processors and at least $16$GB of memory, which the reader is invited to try.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>

#define MXDST 12

typedef struct th {
  struct th *distsub[MXDST];
  int size;
} tree_inst, *tree_ptr;

tree_ptr tree_new(tree_ptr left, tree_ptr right, int mx)
{
  tree_ptr mergebuf[2*mx];
  int pos, idx;

  memcpy(mergebuf, left->distsub,
         left->size*sizeof(tree_ptr));

  pos = left->size;
  for(idx = 0; idx < right->size; idx++){
    tree_ptr item = right->distsub[idx];

    int cmp;
    for(cmp=0; cmp < left->size; cmp++){
      if(mergebuf[cmp] == item) break;
    }

    if(cmp == left->size){
      mergebuf[pos] = item;
      pos++;
    }
  }

  if(pos >= mx) return NULL;

  tree_ptr item = malloc(sizeof(tree_inst));
  assert(item != NULL);

  memcpy(item->distsub, mergebuf,
         pos*sizeof(tree_ptr));

  item->distsub[pos] = item;
  item->size = pos+1;

  return item;
}

typedef struct {
  tree_ptr *data;
  int alloc;
  int count;
  pthread_mutex_t mutex;
} coll_inst, *coll_ptr;

#define COLL_CHUNK 512

coll_ptr coll_new(void)
{
  coll_ptr item = malloc(sizeof(coll_inst));
  assert(item != NULL);

  item->data = malloc(COLL_CHUNK*sizeof(tree_ptr));
  assert(item->data != NULL);

  item->alloc = COLL_CHUNK;
  item->count = 0;

  item->mutex = 
    (pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER;

  return item;
}

coll_ptr coll_record(coll_ptr coll, tree_ptr entry)
{
  if(coll->count == coll->alloc){
    coll->alloc += COLL_CHUNK;

    coll->data = realloc(coll->data, 
                        coll->alloc*sizeof(tree_ptr));
    assert(coll->data != NULL);
  }

  coll->data[coll->count++] = entry;
  return coll;
}

typedef struct {
  coll_ptr *table_ptr;
  int mx;
  int n, m;
} state_info;

void *compute(void *ptr)
{
  state_info *st = (state_info *)ptr;

  int mx = st->mx;
  int n = st->n, m = st->m;
  int dst1, dst2;

  for(dst1 = 0; dst1 < st->mx; dst1++){
    for(dst2 = 0; dst2 < st->mx; dst2++){
      coll_ptr 
        ca = st->table_ptr[m*(mx+1) + dst1],
        cb = st->table_ptr[(n-1-m)*(mx+1) + dst2];

      int c1, c2;
      for(c1 = 0; c1 < ca->count; c1++){
        for(c2 = 0; c2 < cb->count; c2++){
          tree_ptr 
            t1 = ca->data[c1],
            t2 = cb->data[c2];

          tree_ptr tree = tree_new(t1, t2, mx);
          if(tree != NULL){
            coll_ptr targ =
              st->table_ptr[n*(mx+1)+tree->size];

            pthread_mutex_lock(&(targ->mutex));
            coll_record(targ, tree);
            pthread_mutex_unlock(&(targ->mutex));
          }
        }
      }
    }
  }

  return NULL;
}

int main(int argc, char **argv)
{
  int mx = 1;

  if(argc>1){
    int mxcmd = atoi(argv[1]);

    if(1 <= mxcmd && mxcmd <= MXDST){
      mx = mxcmd;
    }
    else{
      fprintf(stderr, "invalid maxdist value, "
              "got %d\n", mxcmd);
      exit(-1);
    }
  }

  int nmx = 1 << mx;
  coll_ptr table[nmx+1][mx+1];

  int n, dst;
  for(n=0; n <= nmx; n++){
    for(dst = 0; dst <= mx; dst++){
      table[n][dst] = coll_new();
    }
  }

  int grand[mx+1];
  for(dst = 0; dst <= mx; dst++){
    grand[dst] = 0;
  }


  tree_inst base;
  base.distsub[0] = NULL;
  base.size = 0;
  coll_record(table[0][0], &base);

  for(n=1; n <= nmx; n++){
    int m; time_t time_begin, time_end;

    time_begin = time(NULL);

    pthread_t threads[n];

    state_info state[n];

    for(m=0; m <= n-1; m++){
      state[m].table_ptr = (coll_ptr *)table;
      state[m].mx = mx;
      state[m].n = n;
      state[m].m = m;

      pthread_create(threads+m, NULL, compute, state+m);
    }

    for(m=0; m <= n-1; m++){
      pthread_join(threads[m], NULL);
    }

    time_end = time(NULL);

    printf("%d: ", n);

    int total = 0, ents = 0;
    for(dst = 1; dst <= mx; dst++){
      int cval = table[n][dst]->count;

      if(cval > 0){
        if(ents > 0) printf(" + ");

        if(cval > 1) printf("%d ", cval);
        printf("u");
        if(dst > 1) printf("^%d", dst);

        total += cval; ents++;
        grand[dst] += cval;
      }
    }

    if(ents == 0) printf("0");
    printf(" (%d)\n", total);

    int secs = (int)difftime(time_end, time_begin);
    if(!isatty(fileno(stdout)))
      fprintf(stderr, "%d [%ds]\n", n, secs);
  }

  printf("-\n");
  for(dst = 1; dst <= mx; dst++){
    if(dst > 1) printf(" + ");

    if(grand[dst] > 1) printf("%d ", grand[dst]);
    printf("u");
    if(dst > 1) printf("^%d", dst);
  }

  printf("\n");

  exit(0);
}

Bumbble Comm On 09 Feb 2015 - 7:59

EDIT: for much better approach, see other answer.

Confirming your values (up to 8) - using a different approach, that should also allow for a more clever method of counting.

Following program needs < 30 min (on one core) to print

(1,1)
(2,3)
(3,15)
(4,111)
(5,1119)
(6,14487)
(7,230943)
(8,4395855)

We enumerate canonical representatives for these DAGs. A representative is a list of pairs of numbers, e.g., [(4,3),(3,2),(1,0),(1,1),(0,0)]. This means that the top node (5) has left child 4, right child 3, node 4 has children (3,2), etc., down to node 1 with children (0,0).

The representative is canonical if

all pairs are different
each node (except first) is linked to from somewhere above
for each level (distance to leaf), the pairs of this level are monotone.

For the example, the level mapping is [(0,0),(1,1),(2,2),(3,2),(4,3),(5,4), so [(1,0),(1,1)] are on the same level, and this list is monotone.

Now instead of generate-and-test (which the program does), we should encode these conditions in propositional logic, and use BDDs for counting.

(EDIT here, original program below)

With somewhat improved internal representation, my program (now too long to post here, perhaps I play code golf later) says a9 = 97608831

I wonder if we can use the following: $a[x_k,..,x_0] =$ the number of dags with $x_h$ nodes at level $h$. (E.g., $a[1,2,1,1]=6$). Here's list (for $\sum x_i=9$, you call this "8 nodes" since you don't count the leaf)

([1,1,1,1,1,1,1,1,1],2027025)
([1,1,1,1,1,2,1,1],424710)
([1,1,1,1,2,1,1,1],489060)
([1,1,1,1,3,1,1],9108)
([1,1,1,2,1,1,1,1],417690)
([1,1,1,2,2,1,1],208428)
([1,1,1,3,1,1,1],14400)
([1,1,2,1,1,1,1,1],279720)
([1,1,2,1,2,1,1],56844)
([1,1,2,2,1,1,1],188280)
([1,1,2,3,1,1],4764)
([1,1,3,1,1,1,1],6300)
([1,1,3,2,1,1],7200)
([1,2,1,1,1,1,1,1],103950)
([1,2,1,1,2,1,1],20520)
([1,2,1,2,1,1,1],22560)
([1,2,1,3,1,1],348)
([1,2,2,1,1,1,1],74340)
([1,2,2,2,1,1],37368)
([1,2,3,1,1,1],3240)

Is there some relation that would allow to compute these numbers without looking at any trees, graphs, dags? Some observations:

$a[1,\dots,1]$ is $(2k-1)!!$
and the others are even (so we should be able to speed up enumeration by 2)?

original source code below:

import qualified Data.Set as S
import qualified Data.Map.Strict as M
import Data.List ( sort )
import Control.Monad ( guard, when, forM_ )
import Control.Applicative
import System.IO

main = forM_ [1 .. ] $ \ n -> do
      print (n, length $ filter dag_ok $ candidates n )
  hFlush stdout

type DAG = [(Int,Int)]

dag_ok :: DAG -> Bool
dag_ok dag =
  nodes_different dag && nodes_linked dag && levels_ok dag

nodes_different dag =
  length dag == S.size (S.fromList dag)

nodes_linked dag =
  S.fromList [0 .. length dag-1]
  == S.fromList (do (x,y) <- dag ; [x, y] )

levels_ok dag =
  let n = length dag ; m = levels dag
      s = M.fromListWith S.union $ do (p,h) <- M.toList m ; return (h, S.singleton p)
      in  weakly_monotone ( map snd $ M.toAscList m )
      && and ( do 
         ( h, ps ) <- M.toList s
         return $ monotone $  do p <- S.toDescList ps ; return $ dag !! (n-p)
         )

monotone xs = and $ zipWith (<) xs $ tail xs
weakly_monotone xs = and $ zipWith (<=) xs $ tail xs

levels [] = M.fromList [(0,0)]
levels ((x,y):d) =
  let m = levels d
  in  M.insert (length d+1) (succ $ max ( m M.! x) (m M.! y)) m

candidates 0 = [ []]
candidates n = do
  d <- candidates (n-1)
  x <- [ 0 .. n-1] ; y <- [ 0 .. n-1]
  return ((x,y):d)

Bumbble Comm On 10 Feb 2015 - 2:35

OK, now it looks better: I'm quite confident the sequence starts

(1,1)
(2,3)
(3,15)
(4,111)
(5,1119)
(6,14487)
(7,230943)
(8,4395855)
(9,97608831)
(10,2482988079)
(11,71321533887)
(12,2286179073663)
(13,80984105660415)
(14,3144251526824991)
(15,132867034410319359)

and that's computed within a few seconds, using the following approach:

based on function count :: [[Bool]] -> Int where count xss is the number of dags with map length xss nodes at the respective level, and in each level, coded by an element xs :: [Bool] of xss, the entries of xs mark whether this node should have a predecessor.

In more detail, here's the specification of count:

We define a function (just for specification, it is not in the source below) shape :: DAG -> [[Bool]] that takes a DAG (any DAG, may have several roots), computes the list of level sets, then for each set, a canonical ordering (a list) of its nodes (lexicographic by left-child, right child, using the ordering in the lower levels), then for each node, whether it has a predecessor (a node higher up that points here). Now count s gives the number of DAGs d that have shape d == s.

The point is that we can define count recursively (induction by the number of levels), and we never really construct DAGs - we just count.

And while we count, we avoid recomputations, using memoFix (a fixpoint combinator with a cache, really). You may simply think count arg = case arg ... return $ count ...

To run this with ghc, you need packages lens and memoize. You can load the source code in ghci and evaluate expressions like count [[False],[True],[True]]. (It seems the code indentation here is broken. Watch out that expressions inside do are aligned properly.)

import Control.Monad ( guard, forM_ )
import Control.Applicative
import Control.Lens
import Data.List (tails, sort)
import Data.Function.Memoize
import System.IO

type Shape = [[Bool]]

main = forM_ [ 1 .. ] $ \ s -> do
       print ( s, sum $ map count $ shapes s )
   hFlush stdout

shapes s = do  sh <- deep_shapes (s-1) ;  return $ [False] : sh

deep_shapes :: Int -> [Shape]
deep_shapes 0 = return []
deep_shapes s = do
  x <- [ 1 .. s ] ; xs <- deep_shapes (s-x)
  return $ (replicate x True) : xs

count :: Shape -> Int
count = memoFix $ \ self arg -> case arg of
       [] -> 1
       (sh : ape) -> sum $ do
      guard $ and $ map not sh
      top <- pairs (length sh) ape
      return $ self $ apply top ape

type Node = (Int,Int)
type Pair = (Node,Node)

apply :: [Pair] -> Shape -> Shape
apply top shape = 
    foldr ( \ (h,k) sh -> sh & ix (length shape - h) . ix k .~ False ) 
      shape $ do (p,q) <- top ; [p,q]

pairs s shape = pick s $ sort $ do
    let cs = candidates shape
        lower = concat $ drop 1 cs
            top = concat $ take 1 cs
    (left,right) <- [(lower,top),(top,top),(top,lower)]
    (,) <$> left <*> right

candidates :: [[Bool]] -> [[(Int,Int)]]
candidates shape = ( do
   (h,ops) <- zip [length shape, length shape-1 ..] shape
   return $ do (n, _) <- zip [0..] ops ; return (h,n) ) ++ [[(0,0)]]

pick :: Int -> [a] -> [[a]]
pick 0 _ = return []
pick s xs = do
  z : ys <- tails xs ; guard $ length ys >= s-1
      zs <- pick (s-1) ys ; return $ z : zs

Bumbble Comm On 27 Feb 2015 - 12:10

The following listing for $n=9$ was computed with the parallel C implementation by user @john_leo. We post an excerpt here in the hope that while the counting problem has been developed quite effectively there is a possibility that the generating functions from the enumeration problem can perhaps be computed algebraically. The data below would then serve to verify the generating function results.

64: 6 u^7 + 1360 u^8 + 155920 u^9 (157286)
65: 12 u^7 + 4206 u^8 + 319564 u^9 (323782)
66: 304 u^8 + 116308 u^9 (116612)
67: 18 u^7 + 3608 u^8 + 281548 u^9 (285174)
68: 420 u^8 + 117816 u^9 (118236)
69: 2168 u^8 + 201820 u^9 (203988)
70: 65120 u^9 (65120)
71: 64 u^7 + 3838 u^8 + 235572 u^9 (239474)
72: 472 u^8 + 68484 u^9 (68956)
73: 1120 u^8 + 157400 u^9 (158520)
74: 440 u^8 + 71104 u^9 (71544)
75: 1880 u^8 + 159308 u^9 (161188)
76: 48 u^8 + 45328 u^9 (45376)
77: 912 u^8 + 127264 u^9 (128176)
78: 26496 u^9 (26496)
79: 30 u^7 + 1904 u^8 + 125308 u^9 (127242)
80: 328 u^8 + 43100 u^9 (43428)
81: 456 u^8 + 76884 u^9 (77340)
82: 48 u^8 + 19152 u^9 (19200)
83: 1764 u^8 + 111784 u^9 (113548)
84: 72 u^8 + 31016 u^9 (31088)
85: 96 u^8 + 60496 u^9 (60592)
86: 19232 u^9 (19232)
87: 976 u^8 + 77316 u^9 (78292)
88: 13400 u^9 (13400)
89: 504 u^8 + 66424 u^9 (66928)
90: 11600 u^9 (11600)
91: 144 u^8 + 60400 u^9 (60544)
92: 10656 u^9 (10656)
93: 20944 u^9 (20944)
94: 6784 u^9 (6784)
95: 12 u^7 + 738 u^8 + 54040 u^9 (54790)
96: 80 u^8 + 7276 u^9 (7356)
97: 160 u^8 + 25040 u^9 (25200)
98: 24 u^8 + 10424 u^9 (10448)
99: 432 u^8 + 36232 u^9 (36664)
100: 5504 u^9 (5504)
101: 72 u^8 + 25168 u^9 (25240)
102: 2208 u^9 (2208)
103: 344 u^8 + 30576 u^9 (30920)
104: 8384 u^9 (8384)
105: 11152 u^9 (11152)
106: 624 u^9 (624)
107: 224 u^8 + 28936 u^9 (29160)
108: 2352 u^9 (2352)
109: 13152 u^9 (13152)
110: 1904 u^9 (1904)
111: 256 u^8 + 20924 u^9 (21180)
112: 2304 u^9 (2304)
113: 10560 u^9 (10560)
114: 1056 u^9 (1056)
115: 11904 u^9 (11904)
116: 1920 u^9 (1920)
117: 1952 u^9 (1952)
118: 384 u^9 (384)
119: 240 u^8 + 20064 u^9 (20304)
120: 1920 u^9 (1920)
121: 4176 u^9 (4176)
122: 1184 u^9 (1184)
123: 7344 u^9 (7344)
124: 888 u^9 (888)
125: 6448 u^9 (6448)
126: 0 (0)
127: u^7 + 84 u^8 + 8776 u^9 (8861)
128: 6 u^8 + 784 u^9 (790)

**Bumbble Comm** · Accepted Answer

Update: An $\mathcal{O}(n^6)$ dynamic programming solution was found on the Project Euler forums : $T(100)$. A faster recurrence was discovered in [Genetrini 2017] : $T(350)$.

I'm hoping that this post will serve as a full exposition of our ideas thus far - so feel free to edit or add to it. We are trying to count trees by their number of distinct subtrees. The compiled list and OEIS A254789 (offset 1):

$$ \begin{array} ( 1 & 1 \\ 2 & 1 \\ 3 & 3 \\ 4 & 15 \\ 5 & 111 \\ 6 & 1119 \\ 7 & 14487 \\ 8 & 230943 \\ 9 & 4395855 \\ 10 & 97608831 \\ 11 & 2482988079 \\ 12 & 71321533887 \\ 13 & 2286179073663 \\ 14 & 80984105660415 \\ 15 & 3144251526824991 \\ 16 & 132867034410319359 \\ 17 & 6073991827274809407 \\ 18 & 298815244349875677183 \\ 19 & 15746949613850439270975 \\ 20 & 885279424331353488224511 \\ 21 & 52902213099156326247243519 \\ \end{array} $$

$$ \; \\ $$

Motivation

The question of counting such trees is a natural extension of the results in $[1]$, neatly summarized

paper abstract

$$ \; \\ $$

Problem Statement

In this thread we will be concerned with one type of tree, namely unlabeled plane rooted full binary trees. For convenience and clarity we drop the descriptive titles and simply call them "trees". In such trees each internal node has exactly two children (full binary) and we establish the order of children (plane). Instead of approximating the number of distinct subtrees given a tree, we want to count the number of trees with $k$ distinct subtrees, or rather the number of trees with $k$ nodes in its compacted DAG. More precisely, let $\mathcal{T}$ be the set of all trees, and for a given tree $\tau \in \mathcal{T}$, let $S(\tau)$ be the set of subtrees in $\tau$. Then we want to count $\tilde{\mathcal{T}_k} = |\mathcal{T}_k|$ where

$$ \mathcal{T}_k = \left\{ \tau \in \mathcal{T}, \; \left| \, S(\tau) \, \right| = k \right\} $$

Note that $\left| \, S(\tau) \, \right| = \left| \, \text{dag}(\tau) \, \right|$ is the number of nodes in the compacted DAG of $\tau$.

$$ \; \\ $$

Examples

To understand what we are counting and how the compaction works, here is an enumeration of $\mathcal{T}_k$ for $k\leq4$. The trees are colored black, their compacted DAGs blue, and their subtrees red. Note how each node in the DAGs corresponds to a particular subtree. [Higher Quality]

$ \qquad \qquad \qquad \qquad $ trees

You may notice that the OEIS is offset by 1 and that the example trees are not full (some internal nodes have less than two children). By removing the leaves from each of our trees we notice a bijection between full binary trees with $k$ subtrees and binary trees with $k-1$ subtrees, hence the offset. All the methods of counting developed have an analog slightly altered to fit this interpretation (eg. in the canonical form of the DAGs each node may have less than two children). Since this doesn't provide any speedup in computation, we will ignore this interpretation henceforth.

$$ \; \\ $$

Preliminary Observations

If a tree has $n$ internal nodes, then it has $n\!+\!1$ leaves. The number of trees with $n$ internal nodes is the Catalan number $C_n = \frac{1}{n+1}{2n \choose n} \approx \frac{4^n}{\sqrt{\pi n^3}} $.
If a tree $\tau$ has height $h$, then $\left| \, \tau \, \right| \in [2h\!-\!1,2^h\! -\! 1]$ and $\left| \, S (\tau) \, \right| \geq h $. The number of trees with height $h$ satisfies the recurrence $T_{(h)} = T_{(h-1)}^2 + 2 T_{(h-1)} \sum_{k=1}^{h-2} T_{(k)}$. As seen on OEIS A001699, $T_{(h)} \approx 1.5^{2^h}$.
The result of $[1]$ gives us a rough estimate for $\tilde{\mathcal{T}_k}$. It tells us that the expected number of subtrees for a tree with $n$ internal nodes is $$\tilde{K}_n = 2 \sqrt{\frac{\log 4}{\pi}} \frac{n}{\sqrt{\log n}} \left( 1 + \mathcal{O}\left(\tfrac{1}{\log n} \right) \right)$$ Suppose we fix $\tilde{K}_n=k$ and solve for $n$. Then we get the expected number of internal nodes of a tree with $k$ subtrees. Given this estimate of $n$, we can guess that there will be roughly $C_n$ different trees with $k$ subtrees.$$ \; $$ $ \qquad \qquad \qquad \qquad \qquad $ $$ \; $$ The fact that this underestimates the true values can perhaps be attributed to the $\mathcal{O}(\frac{1}{\log n})$ term, which I took to be zero. In any case we see exponential growth, which means that we will need to develop a method to count rather than enumerate trees.

$$ \; \\ $$

A Method of Enumeration

Each tree $\tau = x\,(L,R)$ is uniquely defined by its set of subtrees, which can be written in terms of those of the left and right subtrees $L$ and $R$. $$ S(\tau) = \{\tau\} \cup S(L) \cup S(R) $$ This leads to a natural algorithm to enumerate $\tilde{\mathcal{T}_k}$. We build $\mathcal{T}$ by glueing trees together via an added root node. The resulting set of subtrees is the union of the two glued trees plus a new element that denotes the entire tree. Letting $\imath$ denote the singleton tree, the explicit trees we find after each glueing iteration are

$$ \begin{array} ( T^{(1)} &= \{\imath\} & = \{1\} \\ T^{(2)} &= \{\imath, \imath(\imath,\imath)\} & = \{1,2\}\\ T^{(3)} &= \{\imath, \imath(\imath,\imath), \imath(\imath(\imath,\imath),\imath), \imath(\imath,\imath(\imath,\imath)), \imath(\imath(\imath,\imath),\imath(\imath,\imath))\} & = \{1, 2, 3, 4, 5\} \\ \end{array} $$

$$ \begin{align} T^{(1)} &= \{ \{1\} \} \\ T^{(2)} &= \{ \{1\},\{1,2\} \} \\ T^{(3)} &= \{ \{1\},\{1,2\}, \{1,2,3\}, \{1,2,4\}, \{1,2,5\} \} \\ \end{align} $$

Thus we find $\tilde{\mathcal{T}_1}=1, \; \tilde{\mathcal{T}_2}=1, \; \tilde{\mathcal{T}_3}=3$. It is clear that after the $k$th iteration, you will have enumerated all trees in $\mathcal{T}_k$. Of course, you will also enumerate trees with more than $k$ subtrees, so it is prudent to prune any such trees along the way. The downside to this algorithm is that it enumerates every tree. Since the number of trees grows exponentially, we find that computing $\tilde{\mathcal{T}_k}$ becomes intractable for $k>9$. For larger $k$, we will need to develop a method of counting. An implementation was written in Python: code. User Marko Riedel also posted various implementations in (B1) (B2) (B3).

$$ \; \\ $$

A Method of Counting

This method was developed by @d8d0d65b3f7cf42's in his posts (A1) (A2) and then slightly optimized here. Instead of enumerating trees, we count DAGs. We start by characterizing the DAGs in a unique way. Every node $v$ in the DAG represents a unique subtree. We let the height of $v$ be the height of the tree it represents. Note that this equals the length of the longest chain from $v$ to the node $\imath$ that represents the singleton tree. By grouping nodes of the same height we form layers and the graph takes shape.

$\qquad \qquad \qquad \qquad \qquad \qquad\qquad \qquad$ dag shape

Noting the natural ordering of the nodes, we can represent each DAG in canonical form:

There is a unique root with no parents and a unique sink with no children. Note the root and sink correspond to the entire tree and the singleton tree, respectively.

Every node except the sink has exactly two children below it, one of which is on the adjacent layer. The children are ordered, meaning we make the distinction $(a,b) \neq (b,a)$.

Every node except the root has at least one parent.

If $u,v$ are nodes on the same layer with $u < v$, then the ordered children of $u$ must be lexically smaller than those of $v$. That is, if $(u_1, u_2)$ and $(v_1,v_2)$ are the children of $u,v$ respectively, then either $u_1 < v_1$ or $u_1 = v_1$ and $u_2 < v_2$.

One interesting observation is that the number of canonical DAGs with shape $(1,1,1,\ldots,.,1)$ is $(2k-3)!!$. There is a correspondence between these DAGs and a restrictive set of trees constructed in the recursive enumeration method. If at each iteration you only glue two trees if one is a subtree of the other, then you end up with these DAGs exactly. However, this doesn't play a role in our counting computation.

To count $\tilde{\mathcal{T}_k}$ we do the following. First we generate the possible shapes that the DAGs can take. The possible shapes are a subset of the compositions of $k$. With a great/little amount of effort you can enumerate these exactly/approximately by pruning any shapes that have layers with too many nodes (eg. a layer cannot be wider than the number of possible children pairs below). Next, the idea is to keep track of the nodes that have parents. So for each shape there will be many different boolean coverings where each node is assigned a value of $\mathtt{True}$ if it has atleast one parent or $\mathtt{False}$ otherwise. We can count the number of DAGs that have a given $\mathtt{shape}$ and $\mathtt{covering}$ by inducting on the height of the graph. This leads to a recursive algorithm in which we attach the top-layer nodes to those below them in every valid way.

It is here that another optimization manifests. If there are $\ell_1$ nodes in the top layer of the DAG, $\ell_2$ in the second layer and $\ell_{3}$ nodes below, then there are ${\ell_2^2 + 2\ell_2\ell_3 \choose \ell_1}$ ways of connecting the top layer to the rest of the DAG. This number grows fast, becoming unwieldy for even modest shapes. An alternative idea is to choose the children and then count the number of ways to assign the top layer to those children. Suppose we connect the top layer to exactly $\alpha$ nodes in the second layer and $\beta$ nodes below. Using inclusion exclusion, we find that the number of ways to connect the top layer in such a manner is $$ M(\ell_1, \alpha, \beta) = \sum_{j=0}^{\alpha + \beta} (-1)^j \sum_{i=0}^{j}{\alpha \choose i} {\beta \choose j-i} {(\alpha-i)^2 + 2(\alpha-i)(\beta-j+i) \choose \ell_1}$$

Each summand can be realized as such: we choose $\ell_1$ of the possible children pairs given that we leave $i$ children uncovered in the second layer and $j-i$ children uncovered in the lower layers. Finally, a shape $S$ and covering $c$ is canonical if the top layer of $S$ has one node (the DAG contains a root) and if $c$ assigns every other node $\mathtt{True}$ (requirement 3). Below is pseudocode to count the number of DAGs for each covering of a given shape.

$\qquad$Shape $S$ is represented by a tuple of integers
$\qquad$Covering $c$ is represented by a binary string
$\qquad$Let $M(\ell_1,\alpha,\beta)$ be the inclusion exclusion formula as above
$\qquad$Let $D[S,c]$ represent the number of DAGs with shape $S$ and covering $c \\$
$\qquad\mathtt{Count}(S):$
$\qquad\qquad \mathtt{Count}(S[2\colon \!])$ recurse on the subshape
$\qquad\qquad \ell_1 \leftarrow S[1]$ the number of children in the top layer
$\qquad\qquad$ for each set of children $\varsigma$
$\qquad \qquad \qquad \alpha \leftarrow$ the number of children in the second layer $S[2]$
$\qquad \qquad \qquad \beta \leftarrow$ the number of children below the second layer
$\qquad\qquad \qquad $ for each covering $c$ of $S[2\colon\!]$
$\qquad\qquad \qquad \qquad D[S,\varsigma \vee c]$ += $M(\ell_1, \alpha, \beta) \,D[S[2\colon \! ], c]\\$
$\qquad \qquad$ if $S$ has a root node, ie $S[1]=1$:
$\qquad \qquad \qquad c^* \leftarrow$ full covering of $S$
$\qquad \qquad \qquad \tilde{\mathcal{T}_{|S|}}$ += $D[S,c^*]\\$

$ \; \\ $
$\qquad \qquad \qquad \qquad \qquad \qquad \qquad$ shape, covering, top layer assigment

$$ \begin{align} S &= (3,2,3,1,2,1,1) \\ \ell_1, \alpha, \beta &= 3,2,2 \\ c &= \;\;\;\;\;0010101011 \\ \varsigma &= 0001100100100 \\ \varsigma \vee c &= 0001110101111 \\ \end{align} $$

Here is a dirty Python implementation. This code confirms @d8d0d65b3f7cf42's values for $k\leq16$ and was used to obtain values for $k\leq 21$ -- though it takes about 14 hours for $k=21$. I fixed the memory issues by removing old values from the memoization table (being careful not to re-add any recomputed canonical shape). Excitingly, I think I have a way to directly count by shape (which would get us ~10 more values). It counts by an equivalence relation: a $k \times k$ matrix for which the $i,j$th entry is the number of trees of size $i$ that would add $j$ subtrees when taking the union.

The two codes can be easily altered to count trees where we don't care about the order of children (no longer "plane" trees). In each you simply need to comment one line of code. In the enumeration method, you only perform one of the two gluings $x(\tau_1,\tau_2), x(\tau_2, \tau_1)$. In the counting method, the only thing that changes is the third binomial coefficient in the inclusion exclusion formula. The list of values of this sister sequence for $k\leq19$ is [1, 1, 2, 6, 25, 137, 945, 7927, 78731, 906705, 11908357, 175978520, 2893866042, 52467157456, 1040596612520, 22425725219277, 522102436965475, 13064892459014192, 349829488635512316].

Exponential Generating Function of rooted minimal directed acyclic graphs

There are 7 best solutions below

Motivation

Problem Statement

Examples

Preliminary Observations

A Method of Enumeration

A Method of Counting

Related Questions in SEQUENCES-AND-SERIES

Related Questions in COMBINATORICS

Related Questions in GENERATING-FUNCTIONS

Related Questions in TREES

Trending Questions

Popular # Hahtags

Popular Questions