Probability of Precision Error With Uniform Random Variables

190 Views Asked by At

I ran into an interesting probability problem that I am not certain how to solve. The nonmathematical problem is this: a system has a level of precision $\epsilon$ such that if two points are within $\epsilon$, they are indistinguishable. If you need to distinguish between $n$ uniformly distributed points, what is the probability that you have at least 1 error (pair of points within $\epsilon$ of each other)? $k$ errors?

The problem is mathematically: given $X_1, \dots, X_n$ iid random variables distributed uniformly on $[0,1]$, what is the probability that $2k$ of the $X_i$s are within $\epsilon$ of another random variable for a fixed epsilon?

Equivalently, what is $$P \left(\sum_{i = 1}^N \sum_{j = 1}^N \mathbb{1}_{|X_i - X_j| < \epsilon} \leq 2k \right) $$

which I'll denote (somewhat incorrectly) as: $$P(\varphi(N) \leq 2k)$$

My intuition for solving this problem is essentially to build up to the general case above from a chain of conditional probabilities since I'm not sure how to solve for the above CDF in any closed form.

So for the sake of argument, let

$$\psi(n) = \begin{cases} 1& \varphi(n) > \varphi(n - 1)\\ 0 & \varphi(n) = \varphi(n - 1) \end{cases} $$

It's easy to see that $$ P(\psi(n) = 1| \varphi(n - 1) = 0) = 2(n - 1)\epsilon $$

and

$$2\epsilon < P(\psi(n) = 1| \varphi(n - 1) = n - 1) < n\epsilon$$

Which then lends itself to general bounds

$$2(n - k)\epsilon <P(\psi(n) = 1| \varphi(n - 1) = k) < 2(n - k - 1)\epsilon + (k + 1)\epsilon$$

which are not very tight. We can easily find the probability of having at least one error since

$$P(\varphi(n) = 0) = \prod_{i = 1}^n (1 - 2(i - 1)\epsilon)$$

so for $n = 20$, $\epsilon = 0.0001$ we have the probability of failure is $1 - P(\varphi(n) = 0) \approx 0.0373$.

So for large $N$, $P(\varphi(N) > 0) \to 1$. The part that is tricky is handling overlap when we have many errors, as in one case an entire $\epsilon$ could be added to the region we want to avoid, in the other $0$ is added to the region we want to avoid (though both edge cases have measure 0).

This is just a problem for fun, so hints are encouraged.

1

There are 1 best solutions below

0
On BEST ANSWER

I was apparently not paying much attention when I first found this problem. Here is what I believe to be the answer:

Let $X \sim \text{Unif}(0,1)$ and $Y \sim \text{Unif}(-1,0)$ (i.e. $Y \sim -X$ but with $X, Y$ independent). The pdf of both $X$ and $Y$ is $1$. Let $Z = X + Y$ then

$$f_Z(z) = \int f_X(z - y) f_Y(y) \,dy $$

so

$$f_Z(z) = \int_{-1}^0 zy - y^2 \,dy = \frac{zy^2}{2} - \frac{y^3}{3} \bigg\rvert_{-1}^0 = \frac{z}{2} + \frac{1}{3}$$

and

$$P(-\epsilon < Z < \epsilon) = \int_{-\epsilon}^{\epsilon} \frac{z}{2} + \frac{1}{3} \,dz = \frac{z^2}{4} + \frac{z}{3} \bigg \rvert_{-\epsilon}^\epsilon = \frac{\epsilon^2}{4} + \frac{\epsilon}{3} - \frac{\epsilon^2}{4} + \frac{\epsilon}{3} = \frac{2\epsilon}{3}$$

Since the $X_i$ are iid we have this holds for each pair of $X_i$s. So we test for an error between each pair of random variables and have a "error"-"not error" distribution given as expected:

$$\xi \sim \text{Binomial}\left(k; {n\choose{2}}, \frac{2\epsilon}{3}\right)$$

So the probability of getting $k$ errors is

$$P(\xi = k) = {{n\choose{2}}\choose{k}} \left(\frac{2\epsilon}{3} \right)^k \left( 1 - \frac{2\epsilon}{3}\right)^{{n\choose{2}} - k} $$

Note that the first calculation of having no errors is different than if you calculate the same example here. This is likely an error in the conditional thinking of my first attempt.

Just for fun, with precision 0.001 and 20 samples we expect to have no error with probability: 0.88099

1 error: 0.00058

2 errors: $3.92074 \cdot 10^{-7}$

and the number of samples we'd need to almost guarantee an error with the same precision is: 187 (with probability of no error $< 10^{-5}$)