Counting the number of expressions of a given length

142 Views Asked by At

I am not being able to find some paper or book that dwells on this problem ,if you can recommend me anything there would be great.

I will first state the general problem and some subproblems that I have solved. And that the questions I could not solve that are for me interesting.

Problem: How many expression are there of a given lenght ,more exactly having a total number of symbols S that can be subdivided in a set of operations O and a set of terminals T, how many total valid expressions are possible of size n?

For instances with the set $O = \{+,-,/,* \}$ , and $T = \{a,b\}$ and $S = T \bigcup O$ a valid expressions of size 5 (in prefix notation) can be [+-aba] that translates to $(a-b)+a $, this represents an expression of lenghth 5. Let's represents the set of all expressions with this lenght and this operations and this terminals: $EXP_{O,T}^5$. And generically $EXP_{O,T}^n$ is the set of all the operations with a lenght n.

Subproblem 1: For instance considering only operations that have as input 2 terms and as output 1 term (binary operators), and calling that set $O_2$, considering that an expressions is in prefix notation after some dwelling some considerations can be made:

Assert 1: The number of operators in a expression is equal to the number of terminals +1. (can be proven by strong induction)

Assert 2: There are not expressions with an even dimension (can be proven by strong induction)

Assert 3: In evaluating a valid expression , the evaluation ends when the number of terminals counted is equal to the number o operators +1. (it is basically a consequence of 1)

Conclusion: this allow us to relate this problem to the Catalan numbers , and it is made obviously by comparing it to the number of possible valid parenthesis expressions. (substituting every terminal by ")" and every operator by "(" )

[+-a+bcd] -> [(()()) )] so the number of valid expressions of size n are equal to the number of (n-1)/2 pairs of parenthesis sequences.

So in the end the number of expressions with only operators with two inputs, and a given lenght lets called it $ \#EXP_{O_2}^n $ is for n odd equal to:

$$ \#EXP^n_{T,O_2} = Cat(\frac{n-1}{2})\#T^{(n+1)/2}\#O_2^{(n-1)/2} $$

$$\#EXP^n_{T,O_2} = \frac{2}{n+1}\binom{n-1}{\frac{n-1}{2}}\#T^{(n+1)/2}\#O_2^{(n-1)/2} $$

A cool inside on this expression is that for a given value of symbols, #S = P, the value that maximizes the dimension of this space is when $\#T = \#O2+1 $

Subproblem 2: How to generalize this taking into acount unary operations? Lets's call this set $O_1$, one possible case can be $O_1 = \{sin,cos,tan,exp\}$. This subproblem I have solved in the following way. Given a expression of type $\#EXP^i_{T,O_2}$ with $i<=n$ we can form expressions of the type $\#EXP^n_{T,O_2,O_1}$ adding to the rest of the not occupied $n-i$ places unitary operators. This becomes compreensible in the following image enter image description here this example is for n=7 and in this case i= 5, we are creating an $O_1$ expression of dimension 7 from one of dimension 5 of $O_2$. having this image in consideration and noting the following things.

Assert 1 the last element of an expression cant be an unitary operator

Assert 2 we are not counting any expression twice if we now add the new expressions build for an i, with the other i's. that is because the number of operators $O_1$ is diffferent

Conclusion the expression can be done by this formula $$ \#EXP^n_{T,O_2,O_1} = \sum^n_{i=1}\#EXP^i_{T,O_2}\binom{n-1}{i-1}\#O_1^{n-i} $$

notes: I could not simply this expression or find an assymtotic limit, i only could compare it with the previous one plotting it.

Interest problems I didn't solve and that can be fun and are the focus of the question

Problem 1 How to generalize this for ternary operators (quaternary operators, n-ary operators)

Problem 2 How to count and exclude redundancys, being more specific when using some terminals and operations for instance $ O_2 = \{+,-,*,\} $ and $T = \{a,b,1,0\}$ , how to count redundant operations like (1)$A-A$ (2)$1*A = A$ (3)$A+0=A$ (4)$A/1=A$, etc how to count this redundancys and exclude them.Or one more complex redundancy as A+B+C = B+C+A etc...

Problem 3 In GP ,genetic programming, an interesting field is Dimensional Aware GP, in this type of definition if a variable A represents meters and B represents Time it is not valid to add them or subtract them. This can be writen in the following way: (lets assume every variable of the set T is now associated with a vector D that represents is dimension).In this case T = {(t1,d1),(t2,d2)...)}.For example if t2=A, d2 = [1 0] , and t1=B, d1= [0 1] A+B is not valid. How to count this effect (can be in the simple case with only two tipes of variables, note that when multiplying things become more diffucult) (A*A has dimension vector [2 0]). If not an exact count a complexity result would be interesting

Conclusion if you have insights to solve any of these questions or can give me some literature guidance on what to read (papers books) about the topic would be very cool.