It is a well known fact that there is a linear algorithm to determine if two given trees are isomorphic. But my question is: is there a (simple) way to expand that result to forests (a disjoint collection of trees)?
Thanks in advance!
It is a well known fact that there is a linear algorithm to determine if two given trees are isomorphic. But my question is: is there a (simple) way to expand that result to forests (a disjoint collection of trees)?
Thanks in advance!
On
In stead of trying to make a tree from a forest, I think you can do it in a much more abstract way. For every tree, you can compute a canonical form in linear time (let's say in time $T(n)\leq \alpha \cdot n$). This is in essence just a finite string over a finite alphabet, which uniquely characterizes the tree up to isomorphism. Since it is computed in linear time, the length of the string must also be linear with respect to the size of the tree (bounded by the same factor $\alpha$).
Now for a forest, compute the connected components with DFS. Then, for each component, compute its canonical form in linear time. The total running time for all these is still linear since the constant is the same for each tree. Namely if the sizes of the components are $n_i$, $1 \leq i \leq k$, the total running time will be $\sum_{i=1}^k \alpha \cdot n_i = \alpha \cdot n$.
Then, sort the canonical forms lexicographically using radix sort in linear time. You may need a new separating symbol to denote where one tree ends and the next one begins. Now you have a canonical form for forests in linear time.
Here is a lazy argument that is probably not optimal. Given a collection of trees with bounded diameter $<M$, pick a distinguished vertex for each tree, and then attach to these vertices a path of length $2M$. Then glue together all the ends of these paths. See the painting below. This gives a new tree, which I'll call the canopy, and we have added at most $(2M \times \mbox{number of trees})$ edges. Now, if two forests are isomorphic, this induces an isomorphism between their canopies. Conversely, an isomorphism between canopies will have to restrict to an isomorphism between forests, because the canopy root can only be sent to the canopy root (which you can see by considering the radius of the canopy root -- which is why we used $2M$ above -- and its degree), and then the image of each original subtree of the canopy is a subtree of the other canopy.
Since the size of the canopy is linear in the total size of the corresponding trees and the total number of trees (provided we have a bound on the diameter), this gives a positive result. However, I don't think this argument works without a bound on the diameter, and without invoking the total number of trees (rather than the total number of vertices) in the bound.