Is it possible to find k-th quantile only from frequencies of a value list?

76 Views Asked by At

Suppose we have a discrete continuous interval [0, N-1]. And, there exists $M$ samples where each value belongs to the interval.

Now, we want to find a $k$-th quantile (say, 4th quantile) from the samples. I denote the frequency of value $n \in [0, N-1]$ by $freq_n$ and $freq_n \in [0, M]$.

What we actually have is only a list of frequencies for each value, $\vec{f}$ = ($f_0, f_1, \ldots f_{N-1}$).

The naive solution to find the k-th quantile is aligning samples in sorted order and doing something. Is it possible to find the k-th quantile from $\vec{f}$ without samples in sorted order?

In other words, can we somehow express the function $g(\vec{f})$ that finds the k-th quantile?

Taking a example for N=5 and M=10:

$f_0 = 3$

$f_1 = 2$

$f_2 = 0$

$f_3 = 1$

$f_4 = 4$

That means we have a sequence $\mathbf{X} =$(0, 0, 0, 1, 1, 3, 4, 4, 4, 4). But we want to compute $k$-th quantile only from $f_i$.

1

There are 1 best solutions below

6
On

Let me rephrase to make sure I understood your problem:

I have M random variables $X_1,..,X_M$ independent and following the same distribution as a random variable X taking values uniformly in $0,1,2,..,N-1$. I observe the realized frequency of each possible values in my sample is $F=(f_0,f_1,...,f_{N-1})$

Solution based of the sorted sample:

you want the 0.1-quantile let's take the sample in ascending order: $X_{(0)},..,X_{(N-1)}$ i.e $X_{(0)}$ is the minimum

then k-th quantile = $X_{(floor(kM+1))}$ (k is the proportion of values below the k-th quantile)

Solution based on the frequencies:

$k-th$ quantile $= g(F) = \underset{j}{argmax}\bigg(\sum\limits_{i=0}^j f_i \leq k \bigg)$

The k-th quantile is the highest values j such that at most k percentage of my values are below j.

A sum of frequency allows to approximate the cumulative distribution function: $\sum\limits_{i=0}^j f_i = \sum\limits_{i=0}^j \hat P(X = i) = \hat P(X\leq j)$

$\hat P$ are the probabilities guessed from the observed frequencies. k-th quantile: 0.2-th quantile of a sample of size N=100 is the N*k-th (20th) lowest value in the sample.

Definitions:

A frequency is defined like this: $f_i = \frac{\text{number of observations taking the value I}}{\text{number of observations}}$

The median is a $\frac 12$quantile