How to Count the number of words over an alphabet subject to restrictions on letter count?

962 Views Asked by At

For an alphabet $X$, is there a method of computing how many words over $X$ of length $n$ there are where the number of occurrences of each letter must satisfy a system of equations?

For example if $|X| = 5$. Let ${\{a, b, c, d, e}\}$ represent the number of times each of the five letters appears in a length $n$ word over $X$.

With no restrictions I believe the number of words would be $5^n$.

What about subject to the restriction $2a+2e=2d+2b-c$?

Edit: So I had the idea that we could form a set of prime words, such that every word that satisfies the equation could be decomposed into these smaller words (up to permuting the letters of the original word).

Labelling each of the letters ${A,B,C,D,E}$, this set is $\{AD, AB, ED, EB, DCC, BCC\}$, so there are $4$ prime words of length $2$ and $2$ of length $3$ from which all words satisfying the condition must be composed (up to permutation).

Then we can get the total number by checking the partitions of $n$:

\begin{array}{|c|c|c|c|} \hline 1 & 0 \\ \hline 2 & 4 \\ \hline 3 & 2 \\ \hline 4 & 4^2=16 \\ \hline 5 & 4.2=8 \\ \hline 6 & 4^3+2^2=68 \\ \hline 7 & 4^2.2=32 \\ \hline 8 & 4^4+4.2^2=272 \\ \hline \end{array}

Of course with no restrictions the total is now ${{4+n}\choose{n}}$ instead of $5^n$. This isn't really the same as the original problem, but maybe it's relevant. Thoughts?

2

There are 2 best solutions below

5
On

extending a little bit the comment i did about using automatas i have not solve the problem itself, but instead we can count the number of words with the example restriction you made (i.e $2a-2b+c-2d+2e=0$) and adding another restriction(for matter of finiteness of the system) which is that for every prefix $x$ of the word, the following must holds $p\leq 2a-2b+c-2d+2e \leq q$

It can be done by solving the following system of formal series for $A_0$

$A_0=1+2xA_2+2xA_{-2}+xA_1$
$A_i=2xA_{i+2}+2xA_{i-2}+xA_{i+1} \hspace{5mm}$ for $i \in [p+2,q-2]\setminus \{0\}$
$A_{p}=xA_{p+1}+2xA_{p+2}$
$A_{p+1}=xA_{p+2}+2xA_{p+3}$
$A_{q}=2xA_{q-2}$
$A_{q-1}=xA_q+2xA_{q-3}$

which is gained by seeing the following automata(in its matrix form cause i do not know how to draw good :P) $ \left( \begin{array}{ccccc} 0 & 1 & 2 & 0 & 2 \\ 0 & 0 & 1 & 2 & 0 \\ 2 & 0 & 0 & 0 & 0 \\ 1 & 2 & 0 & 0 & 0 \\ 2 & 0 & 0 & 1 & 0 \end{array} \right)$

For example, i have calculated the generating series for words with $q=2,p=-2$ and is $\frac{1-4x^2}{1-12x^2-6x^3+32x^4-8x^5}=1+8x^2+6x^3+64x^4+128x^5+O(x^6)$.

I really like this question and hope to read some cool answer soon :).

0
On

Let $R = a+e$ , $S=b+d$, $T=c/2$ (all are non-negative integers, since $c$ must be even). Then we have the conditions

$$R + S + 2 T=n\\ R - S +T =0$$

From this system we get:

$$ 3 S = n+ R $$ $$ T = n -2S $$$$ 2 R = n - 3 T $$

Then $n/3\le S \le n/2$. We can check that, for each integer $S$ in that range we get one and only one solution.

Now, if we fixed $R,S,T$, the number of combinations is $$ \frac{n!}{R! \, S! \, (2T)!} 2^R 2^S$$

Hence the total number is

$$ M=n!\sum_{S=\lceil n/3 \rceil}^{\lfloor n/2 \rfloor} \frac{2^S 2^{3S-n}}{(3S-n)! S! (2n-4S)!}= \frac{n!}{2^n}\sum_{S=\lceil n/3 \rceil}^{\lfloor n/2 \rfloor} \frac{16^S}{(3S-n)! \, S! \, (2n-4S)!}$$

A few values

n M
2 8 
3 6 
4 96
5 240
10 459648 
20 4081846153216 

We can get an asympotical approximation by a probabilistic argument. Consider a word is generated at random (uniformly), then the variable $x=2a+2e-2d-2b+c$ can be though as the sum of $n$ iid variables $z$ taking the values $z=\pm 2 $ with probablity $2/5$ and $z=1$ with prob. $1/5$.- This means $E(z)=1/5$ and $\sigma^2_z = 84/25$ and the, for large $n$, $x$ is approximately normal with $E(x)=n/5$ and $\sigma^2_x = n \, 84/25$. Then

And $$M = 5^n P(x=0) \approx 5^n \frac{5}{\sqrt{168 \, \pi n }} \exp{\left(-\frac{n}{168}\right)}$$

The agreement is quite good for $n >20$ (better approximations could be obtained via Edgeworth expansions)

 n   5^n/M  ~1/p(x=0)
10  21.245  15.421
20  23.363  23.146
30  30.474  30.086
40  37.451  36.871
50  44.295  43.752
60  51.444  50.867