How many unique observations will appear when sampling subsequences from a sequence?

33 Views Asked by At

I have an ordered sequence of observations

$$X = \{x_1, x_2, x_3, ..., x_n\}$$

and I am sampling subsequences of length $T$ from $X$. A subsequence would look like

$$S = \{x_i, x_{i+1}, ..., x_{i+T}\}$$

with $T<n$ and $\text{max}(i) < n-T$. Each subsequence is sampled with replacement.

If I sample $m$ subsequences (with replacement) of length $T$ from $X$, how many unique observations $x_i$ can I expect to get?

Perhaps another way to frame the question is, how many times $m$ do I have to sample subsequences from $X$ to get a minimum threshold $K$ of unique observations, where $K$ is a percentage of the total number of observations in $X$.