Dear StackExchange Community,
During my preparations for an exam on the "analysis of algorithms" I have encountered a question, that I have not been able to answer. The task is to determine the average path length in tries from the root to the left most leaf. Note that the Wikipedia definition does not exactly match the version of tries that we used in the lecture.
In the latter definition, tries are binary trees with a root, where data points are stored in the leaf nodes and each data point gets assigned a unique $0$-$1$-sequence as a key. When selecting a data point, one starts at the root and goes down left or right in the $n$-th step, according to whether the $n$-th digit of the data point's key sequence is a $0$ or $1$ respectively.
The probabilistic model that we used to analyse certain properties of tries generates independent $0$-$1$-sequences, where the random variables for the digits are also independent and uniform on $\{ 0, 1 \}$.
So far, I have tried to calculate the probability that a trie with $n$ data points has path length $k$ from the root to the left most leaf node. I think that the following should suffice:
\begin{align} P(0^k || \ast ~ \text{and} ~ 0^{k-1} 1 || \ast ~ \text{are keys, but} ~ 0^{k+1} || \ast ~ \text{is not.}) \end{align}
Another approach that I have thought about was using generating functions where I mark the left most nodes.
Unfortunately, I have not succeeded in fully carrying out any of these two approaches. Maybe someone is technically more skilled and can give me a hint on executing the above or has a more elegant approach. In any case, I would be glad to hear of it.