In addition to Complexity of Thue-Morse Sequence, I have the following question:
Has anyone found a characterization for subwords of Thue–Morse sequence? I.e., for a given binary word, can I (easily for some definition of easy) decide whether it is a subword of the Thue–Morse sequence or not? Any references (in addition to Brlek and de Luca–Varricchio) are very welcome!
The Thue-Morse sequence can be generated by repeatedly applying the replacement rule $(0 \to 01), (1 \to 10)$. To check if a word $w$ is a subword of the Thue-Morse sequence, we can apply this rule in reverse: try to "undo" the replacement rule, replacing $01$ by $0$ and $10$ by $1$. This gives us a new word half the length of $w$, and repeat until we're at a short word we know the answer to.
That's the high-level idea; here are the details.
If $w$ appears in the Thue-Morse sequence, it does not have to appear at an even position. For example, suppose we want to know if $001100$ appears in the Thue-Morse sequence:
So which one should we do? We try both! If $w$ contains $00$ or $11$ as a subword, then those two symbols must be in separate blocks, so there's at most one way to continue. If $w$ does not contain $00$ or $11$ as a subword, then the $0$'s and $1$'s must alternate; such a $w$ is not contained in the Thue-Morse sequence when $|w| \ge 5$.
This is an $O(|w|)$ algorithm; in linear time, we can either undo the replacement rule or realize that we cannot do it, and then we get a word of length about $\frac12|w|$ to recurse on.
An example which is a subword
Suppose we want to know if
0010110100101100110100110010appears in the Thue-Morse sequence.00 10 11 01 00 10 11 00 11 01 00 11 00 10does not work; there are00's and11's. So we separate it as.0 01 01 10 10 01 01 10 01 10 10 01 10 01 0.and undo the replacement rule to get100110010110100.10 01 10 01 01 10 10 0.works (but separating it into.1 00 11 00 10 11 01 00would not). We undo the replacement rule to get10100110.10 10 01 10works (but separating it into.1 01 00 11 0.would not). We undo the replacement rule to get1101.11 01does not work, so we separate it into.1 10 1.. We undo the replacement rule to get011.000and111occur in the Thue-Morse sequence; therefore, so does our original word. (We could compute all the words of some longer lengths in advance, and stop earlier.)An example which is not a subword
Suppose we want to know if
10110011010011001011010010110100101appears in the Thue-Morse sequence.10 11 00 11 01 00 11 00 10 11 01 00 10 11 01 00 10 1.does not work, so we separate it into.1 01 10 01 10 10 01 10 01 01 10 10 01 01 10 10 01 01. We undo the replacement rule to get001011010011001100.00 10 11 01 00 11 00 11 00does not work, so we separate it into.0 01 01 10 10 01 10 01 10 0.. We undo the replacement rule to get1001101010.10 01 10 10 10works (but separating it into.1 00 11 01 01 0.would not). We undo the replacement rule to get10111.10 11 1.nor.1 01 11work: both contain a11block, which cannot be the result of the replacement rule. So this word does not appear in the Thue-Morse sequence; therefore, neither does our original word.