Average number of Dyck words in a Dyck word

707 Views Asked by At

Given a integer $n$, how many Dyck words are a substring of a Dyck word of size $n$, on average?

For example, if $n=2$, then Dyck words of size $2$ are :

  1. [ ] [ ]
  2. [ [ ] ]

(1) contains two strict "sub-Dyck words" : [ ] (with the first two parentheses) and [ ] (with the last two parentheses). And the original [ ] [ ]. The total is 3

(2) contains only one strict "sub-Dyck word": [ ]. And the original [ [ ] ]. The total is 2

So for $n=2$ the answer is 2.5, of course it is harder to compute when $n$ gets bigger.

Has anyone an idea on how to find a general formula for this problem?

3

There are 3 best solutions below

2
On

This is the case for the "sub-Dyck words" of length $1$.

You can see http://en.wikipedia.org/wiki/Narayana_number. The "sub-Dyck words" can be regarded as the "peaks" in the Dyck path, and the Narayana number (distribution) counts the peaks of the given length n.

0
On

So, I've coded a little Python program that computes for each $n$, the total number of "sub-Dyck words" in all Dyck words of semi-length $n$

Here is the output for $n$ ranging from 1 to 13 : 1, 5, 21, 84, 330, 1287, 5005, 19448, 75582, 293930, 1144066, 4457400, 17383860. Which is know as A002054 in oeis. And that's even comment number 7. Bingo!

So the number I was looking for is $\frac{(2n+1)\times n}{n+2}$

Yet I don't consider the question solved, as I haven't found the proof for it.

0
On

Every Dyck word can be written uniquely as $(w_1)w_2$, where $w_1$,$ w_2$ are (possibly empty) Dyck words. This should allow you to find a recursive solution to your problem. Every Dyck subword will either be a subword of $w_1$, a subword of $w_2$, ($w_1$), or ($w_1$) followed by a nonempty initial subword of $w_2$. So you need to find subwords, and initial subwords starting at the beginning.

So if we call the function counting the number of subwords $f(w)$ and the number of possibly empty initial subwords g(w), then if $w = (w_1)w_2$ we have $f(w) = f(w_1) + f(w_2) + g(w_2) + 1$. Similarly, we have the recursion $g(w) = 1 + g(w_2)$ as every initial subword of $(w_1)w_2$ is empty, or $(w_1)$ followed by a possibly empty initial word of $w_2$.

This gives you a recursion rule that is usable to get a similar recursion rule to that for the Catalan numbers C(n). First define G(n) as the sum of g(w) over all Dyck words of length 2n. Recursing on the length of $w_2$, this gives us a recursion rule for G: $G(n) = \sum_{j = 0}^{n-1} C(i)(C(j) + G(j)) = C(n) + \sum_{j = 0}^{n-1} C(i)G(j) $, with G(1) = 2. However, if you look closely, plugging in $G(n) = C(n+1)$ solves the equation, so this is still just the Catalan numbers.

Similarly, we can define F(n) as the sum of f(w) over all Dyck words of length 2n. This has a similar recursion rule: $F(n) = \sum_{i + j + 1 = n} F(i)C(j) + C(i)F(j) + C(i)G(j) + C(i)C(j) = C(n+1) + \sum_{i + j + 1 = n} F(i)C(j) + C(i)F(j) $

C and F can then be computed efficiently in quadratic time and linear space as a function of n from the above recursion rules with basic memoization techniques, and dividing by the nth Catalan number gives you the correct average. You can also try deriving a closed form formula for F if you would like, though it will take slightly more work.