https://en.wikipedia.org/wiki/Clique_problem
The article clearly says that all known algorithms run in exponential time. That is okay for us, since we're limited by formal language string properties.
In $s = aaa aaa$ there are two considerable substrings, $a^2$ and $a^3$. Let $A_i$ denote the $i$th occurence (starting from the left) of $a^2$ and similarly $B_i$ for $a^3$.
A1 A3
--- ---
a a a a a a
---
A2
Then you can by eye check that $\{A_1, A_2, B_1, B_2\}$ is a maximal complete subgraph of the undirected graph $G$ where vertices are substring (occurences) and an edge is present if and only if its nodes overlap in the string $s$. Another clique would be $\{A_2, A_3, B_1, B_2, B_3\}$.
Regardless of alphabet size, what is the largest overlapping substring clique in this graph assuming a maximum considerable substring length of $m$ and a minimum length of $2$?
Isn't it obvious that we can deduce this all from the $|\Sigma| = 1$ case? Since surely adding more letters shouldn't increase this maximum. How can I prove that formally?
$m \leq |s|/2$ since you can't have a considerable (in particular, repeating) substring if it's length won't fit twice disjointly into $|s|$ spots. So I'm looking for a polynomial time algorithm in $|s|$ the input size.
Proof attempt. Since we're maximizing, clearly it suffices to consider the case in which the maximum shared length amongst a clique of occurences in $s = aaaaa...$ is $1$ since if you enforce a larger overlap then there is a smaller length outside of the overlap of which potential clique members can occupy. Lesser spots means there is a lesser possible number of distinct substrings.
Here's an example with the length 4 substring:
-------
-------
a a a a a a a a a a ...
-------
-------
Well, clearly, you'd center the previous example on the same common overlap (that's length $1$). So it's now obvious to conjecture from this data alone:
The maximum clique size in this situation is $2 + 3 + \dots + m = m(m+1)/2 - 1$ or $\dfrac{m^2 + m - 2}{2}$ by the "sum the integers $1$ to $N$ trick".
Do you have a cleaner proof? I think you could say something like the common overlap spot of length $1$ can be at any one of the spots in the length subtring $t$ which of course is $|t|$ in number.
The above was just finding the maximum clique size w.r.t. $|s|$. Now I'm still not sure if clique enumeration would indeed be polynomial time.
The way around this is to use the fact that the cliques along a string of purely $a$'s has a regular pattern, and thus if you're well within a region of straight $a$'s, then you can just write down the clique. Then when you add letters to the alphabet, the maximum clique size can be shown to taper off drastically.
Here it is. It's not very robustly written, but seems like it will do the job for my initial tests.
conflict_lists(s, cons)is the main function of concern in this post, but I've included the wholesgp.pyso that you can test the output if you're interested in this topic. The running time looks to be conservatively $O(|s|^4)$ which is polynomial in the input size, it being 4 nested for-loops with loop variables bounded by $|s|$ conservatively, and the number of conflicts seems to be $O(|s|^2)$ but a lot less in practice, perhaps it's $O(|s|\log |s|)$ or something...