Short Version:
Suppose I have a group action $G \times A \to A$ on abelian group $A$ where $A = \Bbb{Z}[G]$, and $G$ is a large finite group. In fact it's too large to enumerate, but easy enough to describe abstractly. In addition we have a homomorphism $\psi : A \to \Bbb{Z}$ that simply sums the lengths of strings or negative the length if a coefficient is negative. $G$ is a quotient of the free group on a set of letters, basically.
We represent a grammar (usually loosely defined as a set of rules) for $s$ as a pruned group homomorphism $\varphi: G \to G,$ such that $\varphi^n(S) = s$ for some $n \geq 1$ where $s$ is our input string, or instance of the smallest grammar problem, and $S$ is the designated start symbol of all grammars. In addition to the above property, $\varphi(a) = a$ for all $a \in \Sigma(s)= \ s$'s alphabet. For each considerable grammar $\varphi$ we have its "representation" in $\Bbb{Z}[G]$ as $\rho(\varphi) = \sum\limits_{X \in \text{var}(\varphi)} X^{-1}\varphi(X)$. And so the size of a grammar is $\psi(\varphi) = \sum_{g \in G} x_g|g|$ where $x = \rho(\varphi) \in \Bbb{Z}[G]$.
So I'm wondering about any ideas to "find such a grammar $\varphi$" that minimizes $\psi(x)$ such that $x = \rho(\varphi)$ for some grammar $\varphi$ of input instance $s \in \Sigma(s)^*$.
Long Version:
(Note: that I haven't adequately identified the correct subgroup of the free group to work with since, I just realized you only want to consider strings "buildable" from substrings of $s$.)
There's the smallest grammar problem (SGP), which is an open problem in computer science still as it relates to the P vs NP problem. Prior research is concerned with approximation algorithms, or how much you can compress an input string $s$ using the output grammar $\mathcal{G}(s)$ that is not guaranteed to be a smallest grammar. I'm here only concerned with exact algorithms, or algorithms that output one or more smallest grammars of an input string $s$. There is always at least one smallest grammar, but there could be several, as in the case of $s = a^6$ ($S \to AA$ vs $S \to BBB$).
Let $\varphi$ denote a grammar.
Background:
- We denote by $\text{var}(\varphi)$ is the set of variables occuring on the left sides of the grammar rules. For example $\varphi = \{S \to AABB, A \to \dots, B \to \dots \} \implies \text{var}(\varphi) = \{S, A, B\}$.
- We denote by $\Sigma(s)$ the smallest alphabet that the string $s$ can be considered "over", i.e. $s \in \Sigma(s)^*$, where $*$ denotes the Kleene closure or monoid of strings.
- We similarly denote by $\Sigma(\varphi)$ the smallest alphabet that right sides of rules (which are strings) can be considered "over". E.g., $\varphi = \{ S \to ABB, B \to Ac, A \to ab\} \implies \Sigma(\varphi) = \{a,b,c, A, B\}$.
$\Sigma(\varphi) = \Sigma(s) \cup \text{var}(\varphi)$ by definition. - A single-string (smallest) grammar $\varphi = \mathcal{G}(s)$ is what we're concerned with for each input string, and is essentially a homomorphism $\varphi : \Sigma(\varphi)^* \to \Sigma(\varphi)^*$ such that for some $n \geq 1$ we have $\varphi^k(S) = \varphi \circ \dots \circ \varphi(S) = s$ for all $k \geq n$. In other words $\varphi$ fixes $\Sigma(s)^*$.
We use the usual definition of grammar size:
$|\varphi| = \sum\limits_{x \in \text{var}(\varphi)} |\varphi(x)|$
where $|\varphi(x)|$ denotes string length. This is precisely what we want to minimize in the SGP.
- $\mathcal{G}(s) = \{S \to s\}$ for any string of length $|s| \leq 5$.
- A substring $t \leqslant s$ is grammar-compressible in $s$ if $|t| = 2$ and $t\lambda_1 t \lambda_2 t \leqslant s$ for some $\lambda_i \leqslant s$ ($t$ occurs disjointly in $s$ at least three times) or $|t| \geq 3$ and $t \lambda_1 t \leqslant s$ for some $\lambda_1 \leqslant s$.
Grammar & grammar size as a group homomorphisms:
Let $s$ be our input string.
In $O(|s|^2\log |s|)$ time determine all grammar-compressible substrings of $s$, denoted $C(s)$. Assign each one an unused symbol $R_x \to x$ for all $x \in C(s)$. Let $V(s) = \{S\} \cup \{R_x | x \in C(s)\}$. For example, $s = ababcabc$ and $V(s) = \{ S, A, B\}$, where $A \to ab, B \to abc$, will do. By "$S$" we will always mean the special start symbol for any of our grammars.
Let $\Sigma = V(s) \cup \Sigma(s)$. And form the free group on $\Sigma$, $F = F(\Sigma)$. Quotient $F$ by $H = \langle g R_x^{-1} xg^{-1} : g \in F(\Sigma), x \in C(s)\rangle$ to form $G = F/H$.
Let's consider the group ring $A = \Bbb{Z}[G]$ together with the group action $G \times A \to A$ given by left, term-wise multiplication of $x = \sum_{g\in G} x_g g$ by an element $h \in G$. This is the setting under which you can start to study group cohomology at least in the perspective of these lecture notes.
Not only that, but a grammar, say $\varphi = \mathcal{G}(a^6) = \{S \to AAA, A \to aa\}$, has the representation $\sum_{X \in \text{var}(\varphi)} X^{-1}\varphi(X)H$ in the group ring $A$. Define $\psi(x) = \sum_{g \in G} x_g|g|$ where $|g|$ is simply the reduced word length of a representative $x \in F$ of $g = xH$. A word is reduced when it has no extraneous things like $\gamma \gamma^{-1}$ as a subword. This is an abelian group homomorphism $\psi: A \to \Bbb{Z}$, since it is clearly $\Bbb{Z}$-linear. Another homomorphism $A \to \Bbb{Z}$ is mentioned on page 5 of Sharifi's lecture notes. It's $\varepsilon : A \to \Bbb{Z}, \ \varepsilon(\sum_{g \in G} x_g g) = \sum x_g \in \Bbb{Z}$. It's known as the augmentation map.
With the above two maps, we can recover the usual definition of grammar size as $|\varphi| = \psi(\varphi) - \varepsilon(\varphi)$ where $\varphi$ is considered as above, a sum of grammar rule right sides where every coefficient $\varphi_g, g \in G$ is equal to $1$. In this way, the previous definition of grammar size, $|\cdot|$, can be generalized to an abelian group homomorphism $A \to \Bbb{Z}$. The logic behind this is that since each term in $\varphi \in A$ is $t = X^{-1} \varphi(X)H$ with $|t| = |X^{-1} \varphi(X)| = 1 + |\varphi(X)|$. Subtracting the augmentation map value $\varepsilon(\varphi)$ will take off the $1$. Alternatively, some authors will just take $\psi$ itself as the grammar size. Either will do, as clearly they are both just homomorphisms $A \to \Bbb{Z}$.
For any grammar $\varphi : \Sigma(\varphi)^* \to \Sigma(\varphi)^*$ of string $s$, we define the depth of the grammar to be $D(\varphi) = \min \{n \in \Bbb{N} | \varphi^n(S) = s\}$.
Whenever choosing unused variable symbols in any algorithm or math on paper, we need to make sure that $\text{var}(\varphi) \subset V(s)$ and that for all $X \in \text{var}(\varphi)$, if $X \to x \in C(s)$, then we also have that $\varphi(X)^{D(\varphi)} = x$. Or in other words, $\varphi$ expands variables to what they should be. If this is the case, then $\varphi$ is consistent with the chosen symbols. This is done so that we don't "double up" on any compressibles - that wouldn't make sense in the context of reducing size. It is also done so that we can begin to talk about the following.
If $\varphi$ is consistent with the chosen symbols, then it can be viewed as not only a monoid homomorphism $\Sigma(\varphi)^* \to \Sigma(\varphi)^*$ but also a (because it's consistent) monoid homomorphism $\Sigma^* \to \Sigma^*$ which then homomorphically extends in a unique way to all of $F \to F$. It also has its natural counterpart as $G \to G$.
Thus, let $\text{End}(G)$ be the set of endomorphisms of $G$. It forms a monoid in general. Within it is essentially every grammar $\varphi$ of $s$ we'd like to consider (we naturally want to consider as few as possible), but a lot more as well. Clearly $F(\Sigma(s)) \subset F(\Sigma)$. And every grammar we want to consider fixes $\Sigma$ and so fixes $F(\Sigma)$. So consider the smaller space $\text{Stab}_G(s) = \text{Stab}_G(\Sigma) \leqslant \text{End}(G)$. Our endomorphisms will stabilize $s$ if and only if they stabilize $\Sigma$.
Questions:
Is it true that $G \approx F$ itself?
If so, does every endomorphism of $G$ bijectively correspond to an endomorphism of $F$?
I'm going to assume the above are true. Then it's true that every endomorphism in $\phi \in \text{Stab}_G(\Sigma)$ just a map $\Sigma \to \Sigma^*$ that is extended homomorphically to all of $G$ (or $F)$, and that fixes $\Sigma(s)$ and so is completely determined by its values on strictly the variables $V(s)$.
Clearly, $\text{Stab}_G(s) \leqslant \text{End}(G)$ is a submonoid. So to show that $\text{GP}_G(s) = \{ $ "grammar problem" (GP) or all grammars of $s\}$ is a submonoid of $\text{Stab}_G(s)$ we have merely to show closure under $\circ$, that for two $\psi, \varphi \in \text{GP}_G(s)$ we have that $\psi \circ \varphi$ is also in the set. The only thing left to prove is that $(\psi \circ \varphi)^n(S) = s$ for some $n \geq 1$. Assume that $m = \max (D(\theta) : \theta \in \{\varphi, \psi\})$ is $1$. Then $D(\psi) = D(\varphi) = 1$ and we have $(\psi\circ \varphi(S)) = \psi(s) = s$ since $\psi \in \text{Stab}_G(s)$ by definition. By way of induction, assume that it's true for all $1 \leq m \lt n$. If $m = n$, we have $\psi^m(S) = s = \varphi^m(S)$... (TODO)
Assome that $\text{GP}_G(s) \leqslant \text{Stab}_G(s) \leqslant G$ are all monoids.
Pruning extraneous grammar rules:
In the grammar $\varphi = \{ S \to AAA, A \to aa, C \to aaa\}$ the rule $C \to aaa$ is not ever used on the way to expanding $\varphi$ via iteration: $\varphi^n(S), n \geq 0$. One way of describing this is that the string "$C$" does not occur as a substring $C \nleqslant \varphi^n(S)$ for any $n \geq 0$.
Define the grammar homomorphism pruner w.r.t. $S$ on a grammar $\varphi$ to be $(\cdot)_S : \text{End}(G) \to \text{End}(G), \ (\varphi)_S = \varphi_S = \{ (X \to x) \in \varphi : X \leqslant \varphi^k(S), $ for some $k \geq 0\}$, where we define $\varphi^0(S) = S$, the string with one letter or $|S| = 1$. Note that "$X$" here (when used like $X \leqslant \dots$) is also synonymous with the string of one letter that is $X$.
What happens is we have an equivalence relation $\varphi \sim \psi \iff \varphi_S = \psi_S$, and this pruner respects the group law in $\text{End}(G)$ since $\varphi_S\circ\psi_S = (\varphi \circ \psi)_S$ for all $\varphi, \psi \in \text{End}(G)$. Denote the new, corresponding monoids $\text{End}^{\text{prune}}(G) \geqslant \text{Stab}_G^{\text{prune}}(s) \geqslant \text{GP}_G^{\text{prune}}(s)$.
Lemma 1. The monoid $\text{GP}^{\text{prune}}_G(s)$ is finite.
Proof. There's only a finite number of combinations of rules, each rule expanding to an element of the finite set $C(s)$ and composed of the symbols in $\Sigma$. You can also argue by directly counting this naively with a finite bound.
Though it is finite, it's still rather large for say $|s| \gt 100$ which indicates that it's a bad idea to enumerate it at all. It's purely for theoretical analysis.
From herein, $\varphi_S \equiv \varphi$ notationally. Obviously we're only interested in pruned grammars when computing.
However, in $\text{End}^{\text{prune}}(G)$ we have that $(\varphi \circ \psi)_S = \psi_S$ always, since $\psi$ essentially hides $\varphi$. This makes the semigroup (I think now it's a semigroup and maybe not a monoid), not very interesting on its own.