I need to come up with a hypothesis and prove whether the language in question is regular (e.g. pump it out / provide a DFA that generates it). $$L = \{ xyyz \in \{0,1 \}^* \colon x,z \in \{0,1 \}^*, y \in \{0, 1\}^+\}$$
My first guess was that this language was not regular. I tried several words to apply with the pumping lemma but the $x$ and $z$ parts seem to ruin everything.
Words that I tried:
- $0^n1^n$ - Fail. Pair of ones will become $xx$, rest is nullable
- $(01)^n$ - Fail. If we are allowed to pump a single digit, then removing it altogether will result in two identical digits next to each other - the new $yy$
I also tried to prove it the other way and build a DFA - but I gave up when I came to the $yy$ part (it is not even context-free...)
Now, could you give me a hint what word to use / how to build the automaton?
This language is actually regular, you just have to reframe how it's presented.
Note that any string with two $0$s in a row or two $1$s in a row is in $L$, since we can take $y = 0$ or $y = 1$. The remaining strings alternate $0$s and $1$. But any string of length at least $4$ which alternates $0$s and $1$s will contain $0101$ or $1010$ as a substring, and this will be in $L$, since we can take $y = 01$ or $y = 10$.
So the only strings not in $L$ are those of length less than $4$ which alternate $0$s and $1$s. I'll leave it to you to come up with a DFA that accepts all but those $7$ strings.