Given $ n $ distinct numbers that are randomly shuffled to form a sequence $ A = [a_1, a_2, \ldots, a_n] $, we select the largest number $ x_1 $ from the sequence. Subsequently, we pick the largest remaining number $ x_2 $ that appears after $ x_1 $, and continue this process. This results in a decreasing sequence $ X = [x_1, x_2, \ldots, x_k] $ of length $ k $. What is the expected value of $ k $?
For example:
- If the sequence $ A $ is $ [1, 5, 4, 2, 3] $, then $ X = [5, 4, 3] $.
- If $ A = [4, 3, 2, 1, 5] $, then $ X = [5] $.
- If $ A = [5, 1, 3, 2, 4] $, then $ X = [5, 4] $.
- If $ A = [3, 2, 4, 5, 1] $, then $ X = [5, 1] $.
Note: it's different from longest decreasing subsequence. For longest decreasing subsequence, the expected length is approximately $2\sqrt{n}$ here
Let $ k $ be the length of $ X $, and let $ b_i = n - \text{index}(x_{k-i}) $, where $\text{index}(x_{k-i})$ denotes the position of the element $ x_{k-i} $ in $ A $.
For example, if $ A = [1, 5, 4, 2, 3] $, then $ X = [5, 4, 3] $, and:
- $ b_0 = n - \text{index}(x_3) = n - \text{index}(3) = 5 - 5 = 0 $
- $ b_1 = n - \text{index}(x_2) = n - \text{index}(4) = 5 - 3 = 2 $
- $ b_2 = n - \text{index}(x_1) = n - \text{index}(5) = 5 - 2 = 3 $
The probability of the length $ k $ can be represented as $$ P(\text{length} = k) = \frac{1}{n} \sum_{{1 \le b_1 < b_2 < \cdots < b_{k-1} \le n-1}} \frac{1}{b_1 \cdot b_2 \cdots b_{k-1}}. $$ The expected length is then $$ E[\text{length}] = \sum_{k=1}^{n} k \cdot P(\text{length} = k). $$
How can we simplify the above formulas and determine their asymptotic behavior as $ n \to \infty $?
This question arises in the context of analyzing the average space complexity of the sliding window maximum with window size $n$ for a random infinite data stream. A monotonic deque is used to track the above structure within a window of size $ n $.
For example, data stream 4, 3, 5, 3, 2, 4, 1, 10, and window size n = 4
[4, 3, 5, 3] 2, 4, 1, 10
deque=[5, 3]
4, [3, 5, 3, 2], 4, 1, 10
deque=[5, 3, 2]
4, 3, [5, 3, 2, 4], 1, 10
deque=[5, 4]
4, 3, 5, [3, 2, 4, 1], 10
deque=[4, 1]
4, 3, 5, 3, [2, 4, 1, 10]
deque=[10]
Let's use $K_n$ for the length of the generated sequence $X$, when you start with $n$ shuffled distinct numbers.
Then you start with the trivial $\mathbb E[K_0]=0$ and $\mathbb E[K_1]=1$.
If the largest value in $A$ is at position $a$ then the rest of the analysis is on the right-hand $n-a$ values of $A$, so you can find the conditional expectation $\mathbb E[K_n \mid \arg\! \max A = a] = 1 + \mathbb E[K_{n-a}]$,
which combined with shuffling giving a uniform distribution, i.e. $\mathbb P(\arg\! \max A =a)=\frac1n$, gives the recurrence $$\mathbb E[K_n] =\sum\limits_{a=1}^n \frac1n (1 + \mathbb E[K_{n-a}])$$ and experimenting with small values suggests a solution using harmonic numbers which you can easily prove by induction $$\mathbb E[K_n] =\sum\limits_{a=1}^n \frac1n = H_n$$ and, as $n$ grows, this is about $\log_e(n)+\gamma+\frac{1}{2n}$, so the asymptotic behaviour is logarithmic.