Probability that a binary search tree is non-empty/full at level $n$

576 Views Asked by At

First, binary search tree (BST) is a very basic data structure in computer science, so I expect that anyone with some programming knowledge knows what it is.

Notation of level: the root of the BST has level 0, its immediate children has level 1, and so on. See figure below.

example BST

Question: insert a random permutation of a sequence of (different) integers $1,2,3,\dots,2^n-1$ into a BST, find the probability

  1. $P_N(i)$ that level $i$ has at least one number (non-empty)?
  2. $P_F(i)$ that level $i$ is full?

Note that the maximum number of levels a BST can have is when the tree has no "branching": there are $2^n-1$ levels; and the minimum number of levels a BST can have is when every level is full: there are $n$ levels.

(On $P_N$) Levels $i\le n$ always have at least one number, so $P_N(i\le n) = 1$. Level $2^n-1$ can and only can have one number when the tree is linear (i.e., no branching). Since only two of the $(2^n-1)!$ permutations ($[1,2,3, \dots]$, or $[\dots, 3,2,1]$) generate linear trees, $P_N(2^n-1)=2/(2^n-1)!$.

(On $P_F$) The 0th level is always full, while levels beneath level $n$ can never be full. So $P_F(0) = 1$, and $P_F(i>n) = 0$.

1

There are 1 best solutions below

0
On

Let $a_N(i, m)$, $a_F(i,m)$ be the probability that level $i$ is nonempty or full respectively, for a BST of $m$ vertices (where the inputs are presented in random order). Of course $a_N(0,m) = a_F(0,m) = 1$ if $m \ge 1$, $0$ if $m = 0$.

Condition on the first input. If the first input is the $k$'th (from least to greatest), then that goes in the root, the left branch is a BST of $k-1$ vertices and the right branch is a BST of $m-k$ vertices, again with inputs presented in random order. If $i \ge 1$, level $i$ is nonempty iff level $i-1$ of either left or right branches is nonempty. Thus

$$ a_N(i,m) = \frac{2}{m} a_N(i-1,m-1) + \frac{1}{m}\sum_{k=2}^{m-1} \left(a_N(i-1,k-1)+a_N(i-1,m-k) - a_N(i-1,k-1)a_N(i-1,m-k)\right) $$

Similarly, level $i$ is full iff level $i-1$ of both left and right branches are full. Thus

$$ a_F(i,m) = \frac{1}{m}\sum_{k=2}^{m-1} a_F(i-1,k-1) a_F(i-1,m-k) $$