Distribution of the closest sample to a certain point

162 Views Asked by At

Suppose there are $n$ samples drawn from a CDF $F$ and pdf $f$ with support $[0,1]$.

The distribution of a sample that is closest to zero can be found using the theory of order statistics.

If I want to find a sample that is closest to a certain point, say 0.5, what would the distribution look like? Is there any closed-form expression for the closest sample distribution to the point?

1

There are 1 best solutions below

1
On BEST ANSWER

Suppose we want to find the pdf of the location $x$ of the sample (out of $n$ samples) which is closest to a given point $a \in [0,1]$. Let's call the desired pdf $g(x, a, n)$. We can proceed by analogy with the usual derivation for the expression for the pdf of order statistics.

If our sample's position $x$ is closer to $a$ than any other sample, that means that all the other $n-1$ samples lie outside the interval $\bigl[a-|a-x|, a+|a-x|\bigr]$. The probability that any given sample (independent of the others) lies outside this interval is: \begin{align} P\left(\text{outside } \bigl[a-|a-x|, a+|a-x|\bigr]\right) &= 1 - P\left(\text{inside } \bigl[a-|a-x|, a+|a-x|\bigr]\right)\\ &= 1 - \bigl(F(a + |a-x|) - F(a-|a-x|)\bigr)\, . \end{align} We must have $n-1$ samples outside this interval. The probability of that occurring is: \begin{align} P(n-1 \text{ outside}) &= {n\choose n-1} {P(1 \text{ outside})}^{n-1}\\ &= n\, {\Bigl[1 - \bigl(F(a + |a-x|) - F(a-|a-x|)\bigr)\Bigr]}^{n-1}\, . \end{align} (The binomial coefficient is there because there are $n$ ways of choosing which of the $n$ samples is closest to $a$.) This, then, is the factor that multiplies $f(x)$, the "baseline" pdf for our closest sample, had it been the only sample: $$ g(x, a, n) = n\, {\Bigl[1 - \bigl(F(a + |a-x|) - F(a-|a-x|)\bigr)\Bigr]}^{n-1} f(x) $$

Note that as $a\rightarrow 0$, this reduces to the usual expression for the pdf of the smallest ($k=1$) sample out of $n$, assuming $F(x) = 0$ for $x < 0$.

Here is an image I made (in Mathematica) showing this pdf $g(x, a, n)$ assuming an underlying uniform distribution on the samples, $n=5$ samples, and values of $a = 0, 0.25, 0.5, 0.75, 1.0$:

g pdf

Edit:

Here is what it looks like if I choose the skewed pdf $f(x) = 2x$ for $x\in[0,1]$, still with $n=5$:

g2 pdf