Can a CFG over large alphabet describe a language, where each terminal appears even number of times?
If yes, would the Chomsky Normal Form be polynomial in |Σ| ?
EDIT: What about a language where each terminal appears 0 or 2 times?
EDIT: Solving this may give interesting extension to "Restricted read twice BDDs and context free grammars"
Find a context-free grammar (CFG) for the language L of all words such that each terminal in a word occurs even number of times over a possibly large alphabet Σ
My long aproach is (the only nonterminal is S):
S ⟶ ε | SS
x ∈ Σ : S ⟶ xSx
x,y ∈ Σ : S ⟶ xxSyy | yySxx | xySxy | xySyx | yxSyx | yxSxy
Second try, incremental approach.
S ⟶ ε | SS
Terminal productions (suboptimal): x,y ∈ Σ : S ⟶ xx | yy |xxyy | yyxx | xyxy | xyyx | yxyx | yxxy
Incremental: x ∈ Σ : S ⟶ SxSxS
By the way, the homework is over now.
Technically it wasn't a homework assigned to me. I don't know the answer and thought the question was easy. Best.
EDIT: I didn't award the bounty :-)
There is a deterministic finite state machine that accepts a word if and only if each terminal appears an even number of times. Assume that there are $k$ symbols in the alphabet. The machine has $2^k$ states, each of which is associated with a subset of the alphabet, and we identify the states with the subsets. The machine is defined so that:
The idea is that a state $B$ corresponds to the set of terminals that have appeared an odd number of times so far. Being in state $\emptyset$ corresponds to every terminal appearing an even number of times.
Since the language is accepted by a DFSM, the language is regular. This means that there is, trivially, a CFG that accepts the language. The grammar has one nonterminal for each state of the machine, and the rules of the grammar match up with transitions of the machine. I don't know yet whether there is a context free grammar for this language that is polynomial size compared to the size of the alphabet, though.