Algorithm for finding a "normal" form for a set of strings under character-mapping isomorphism?

Question

Algorithm for finding a "normal" form for a set of strings under character-mapping isomorphism?

79 Views Asked by Bumbble Comm At 27 Mar 2026 - 12:09

Say two strings are "isomorphic" if an injective character mapping can be used to transform one into the other. Should be clear that this is an equivalence relation.

So "egg" = "bee" = "foo" but none of those are equivalent to "bed" = "dog" = "abc".

One thing we can sometimes do with eq. relations is find a "normal form" that each member of the eq. class class can agree on. For this relation, that's easy to do; just iterate down the string, and (with respect to some ordering of the alphabet, e.g. abcde...) substitute any previously-unseen character for the next character in the ordering.

so "abb" would be the normal form of "egg" and "bee", which is clearly distinct from "bed"'s normal form, "abc".

Say instead we wanted to find the normal form of a list of strings, like <"foo", "bar", "baff">. Turns out the same alg can be used, to get a normal form like <"abb, "cde", "cdaa">.

All of this is setup for the real question, which is about extending this to a set of strings. For a set of strings |S| = n there are n! possible permutations of the strings, which makes finding a normal form tricky. Using the same algorithm as before we could end up with {"abb, "cde", "cdaa"} or {"abc", "dee", "abdd"} (which is clearly in the same eq. class) depending on how we iterate through the set.

What properties of the set can we use to define a normal form which is easily calculable? Are there ways to do it that don't take O(n!) time (I can imagine one such n! algorithm which computes this style of "normal form" for each permutation and then picks the "lowest" one)?

Is this problem equivalent to some other well-known problem?

Thanks!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

In the general case - where our strings are over an alphabet of arbitrary size - this problem is exactly as hard as the problem of finding canonical labelings for graphs (which is more or less the graph isomorphism problem). Bad news: there's no known polynomial-time algorithm. Good news: in practice, we can usually solve this problem quickly, and there's plenty of implementations of good algorithms out there.

In one direction, if we had an efficient algorithm for finding a "normal form" for sets of strings, we could use it to find canonical labelings for graphs. Given a graph $G = (V,E)$, we construct a set of $2|E|$ words over the alphabet $V$, by turning an edge $\{v_i, v_j\}$ into the two words $v_iv_j$ and $v_jv_i$. Two graphs are isomorphic precisely when the normal forms of these sets are equal.

In the other direction: given a set of strings $S$ over an alphabet $A$, we construct a graph on a vertex set containing $S$ and $A$ as follows: whenever the $i^{\text{th}}$ letter of $s \in S$ is the letter $a \in A$, we put a path of length $i$ between $s$ and $a$ (using extra vertices not appearing anywhere else). To distinguish $A$ and $S$ inside this graph, one way is to add one leaf vertex to every vertex in $A$, and two leaf vertices to every vertex in $S$. Actually, in practice, most algorithms for finding canonical labelings of graphs can already handle being given some distinguished vertex sets - it only makes their task easier.

We'll get isomorphic graphs precisely when we start with equivalent sets of strings, and a canonical labeling of the graph will in particular give a canonical ordering of both $A$ and $S$: the normal form you want.

Algorithm for finding a "normal" form for a set of strings under character-mapping isomorphism?

There are 1 best solutions below

Related Questions in NUMBER-THEORY

Related Questions in ALGORITHMS

Related Questions in EQUIVALENCE-RELATIONS

Trending Questions

Popular # Hahtags

Popular Questions