Calculating the cardinality of the set of integers accumulated by maximum

19 Views Asked by At

This question steps from calculating the average-case time complexity of a solution for a problem from StackOverflow. I'll outline the problem programmatically first, for ease of intuition, and then follow it up with a mathematical definition.

I'm given a list of N integers sampled uniformly (with repetition) from the half-open interval [0, M). In my algorithm, at some point I will accumulate a maximum value over this last, and take the unique values. In Haskell:

Data.Set.fromList $ scanl1 max values

So, for the list $[1,5,2,3,7]$, scanl1 would give me back the list $[1,5,5,5,7]$, which fromList then transforms into a Set, yielding the result $[1,5,7]$. Mathematically, I could write a recurrence relation:

\begin{align} \begin{split} a_0 &= 0 \\ a_{k+1} &= max \left( a_k, e_k \right) \end{split} \end{align}

Where $e_k$ is the $k$th element of my initial list, and I formulate a set $S$ as:

\begin{equation} S = \{ a_{i+1}, i \in [0, N) \} \end{equation}

Now, what I'm after is the expected cardinality of $S$, given $N$ and $M$ and that we know these numbers are sampled uniformly.

As far as what I've tried- I know I could get an exact value (at least for small $N$) by programmatically exhausting all possible samplings, summing their cardinalities and dividing by number of samplings, and get the expectation value as:

\begin{equation} E(|S|) = \frac{ \sum\limits_{i=0}^{N} f (q_i)}{M^N} \end{equation}

where $f(x)$ is my function that maps a list to a set of unique accumulated maximums, and $q_i$ is the $i$th sample from some ordered listing of all possible samplings.

This however won't help me in calculating a bound for my time complexity. I thought about writing some recurrence relation. I could write a function $C_1(x, y, z)$ which counts the number of ways to permute a set $Q$ of $x+y+z$ elements, where $x$ elements are less than some (unknown) threshold value, $z$ elements are greater, and $y$ elements are equal to it, such that $f(Q) = 1$. I could then potentially use $C_1$ to define $C_2$, and so forth to generalized $C_k$. However, this approach didn't seem to play out either- it devolved into a more-or-less dynamic programming approach, rather than a closed-form solution.

Any help with this would be greatly appreciated. As I said, this is just for a time complexity analysis for a solution on a sister site, but it has caught my interest now, and I'd like to see how a problem like this can be solved.