Trying to simplify a regular expression of which I currently have an abominably long one

38 Views Asked by At

I have a pretty simple question, but I don't know how to simplify this regular expression further from the answer that I have. I'd like to find a regular expression for the set of strings over {x,y,z} that have precisely two x (no need to be together) and one y.

The answer that I have is a really long one that basically is the union of all the possible concatenable combinations, but it just feels dumb.

What am I missing? How do you show the regular expression for 3 (x,x,y) things that have to be there but in no particular order, with c*?

1

There are 1 best solutions below

2
On BEST ANSWER

You could start with the 'union' you've come up with, eg

$(\ (x|y|z)^*y(x|y|z)^*x(x|y|z)^*x(x|y|z)^*\ |\ (x|y|z)^*x(x|y|z)^*y(x|y|z)^*x(x|y|z)^*\ |\ (x|y|z)^*x(x|y|z)^*x(x|y|z)^*y(x|y|z)^*\ )$,

In human English, this might be read:

  • any number of anything, then a $y$, then any number of anything, then an $x$, then any number of anything, then an $x$, then any number of anything
  • OR: any number of anything, then an $x$, then any number of anything, then a $y$, then any number of anything, then an $x$, then any number of anything
  • OR: any number of anything, then an $x$, then any number of anything, then an $x$, then any number of anything, then a $y$, then any number of anything

and then simplify it by factoring out the common prefix and suffix:

$(x|y|z)^*\ (\ y(x|y|z)^*x(x|y|z)^*x\ |\ x(x|y|z)^*y(x|y|z)^*x\ |\ x(x|y|z)^*x(x|y|z)^*y\ )\ (x|y|z)^*$.

In human English, this means:

  • Any number of anything, THEN
    • a $y$, then any number of anything, then an $x$, then any number of anything, then an $x$
    • OR an $x$, then any number of anything, then a $y$, then any number of anything, then an $x$
    • OR an $x$, then any number of anything, then an $x$, then any number of anything, then a $y$
  • THEN, any number of anything

You can simplify this further with more factoring out of common prefixes and suffixes:

$(x|y|z)^*\ (\ y(x|y|z)^*x(x|y|z)^*x\ |\ x(x|y|z)^*\ (\ y(x|y|z)^*x\ |\ x(x|y|z)^*y\ )\ )\ (x|y|z)^*$.

Again, translating into human, this means:

  • Any number of anything, THEN
    • a $y$, then any number of anything, then an $x$, then any number of anything, then an $x$
    • OR an $x$, then any number of anything, THEN
      • a $y$, then any number of anything, then an $x$
      • OR an $x$, then any number of anything, then a $y$,
  • THEN, any number of anything