Let $S$ be a string. Write $u \leqslant S$ to mean that $u$ is a substring. Define $C_S = \{ u \leqslant S: |u| \geq 2 $ and $ u \alpha u \leqslant S $ for some $\alpha \leqslant S \}$. Now let $\mathcal{A}$ be the set of all subsets $A \subset C_S$ such that there exists a decomposition $S = \sigma_1 \cdots \sigma_n$ and for all $u \in A$ either:
- $\exists 1 \leq i \lt j \leq n$ such that $\sigma_i = u = \sigma_j$,
- $\exists 1 \leq i \leq n$ such that $\sigma_i = u$ and $u \leqslant v \in A, $ for some $v \neq u$, or
- $u \leqslant v \in A$ and $u \leqslant w \in A$ and $u,v,w$ are distinct.
I want to prove that $\mathcal{A}$ is a matroid. I can easily show that any subset of $A$ is also of this form. So what's left to prove is that if $B, A \in \mathcal{A}$, and $|B| \gt |A|$, then there exists $u \in B\setminus A$ such that $A \cup \{u\} \in \mathcal{A}$.
Can this be done by induction on the length of $S$?
Clearly, $\mathcal A$ is a matroid for $S = $ the empty string as it is just the trivial matroid $\mathcal A = \{\varnothing\}$.
Now assume that it's true for all lengths $|S| = 0 \dots N$. Then if we add a single symbol to $S$, say $a \in \Sigma'$, if $a$ is a new symbol not found in $S$, then we're done. So assume that $a$ occurs in $S$. Now, if no new elements to $C_S$ are created, ie. $C_S = C_{Sa}$ then we are similarly done. Thus we only need to handle the case when a new element of $C_S$ is formed.
Application is to the smallest grammar problem. I think I may have a greedy algorithm approach that might work. And some algorithms are shown optimal this way (there are some theorems about greedoids). So that's why it's interesting.
It's probably less work to do induction on the size of $C_S$. Now for $C_S = \{\}$ clearly $\mathcal{A} = \{\varnothing\}$ the trivial matroid. So assume that it's true for $|C_S| = 1 \dots N$, now if $C_S' = C_S \cup \{x\}$ is adding a new element define $\mathcal{A}' = \mathcal{A} \cup \{A \cup \{x\} : $ there is a decomposition $S=\sigma_1 \cdots x \sigma_k \cdots x\sigma_l \cdots \sigma_n$ and $\forall u \in A$ one of the following holds $\}$:
So that clearly $\mathcal{A}'$ is indeed the object we're set out to prove is a matroid. We only need to show that if $A, B \in \mathcal{A}'$ and $|B| \gt |A|$ then there is $y \in B\setminus A$ such that $A\cup \{y\} \in \mathcal{A}'$.
But if $B', A' \in \mathcal{A}$, we're done by hypothesis. So first assume that $B' \in \mathcal{A}$ and $A' \in \mathcal{A}' \setminus \mathcal{A}$. Then $A' = A \cup \{x\}$ for some $A \in \mathcal{A}$ and $|B'| \gt |A| = |A' \setminus \{x\}|$, so simply augment $A'$ and we're done. Now if $B' \in \mathcal{A}'\setminus \mathcal{A}$ as well, say $B' = B \cup \{x\}$. Then subtract $\{x\}$ from both $A'$ and $B'$ to get $|B| \gt |A|$ still! Now since, both $A, B \in \mathcal{A}$ a known matroid, we are done! $\square$