Expressing the maximum difference between Gaussian random variables? And the probability that it is within a given limit.

262 Views Asked by At

Say I have a set of independent Gaussian random variables:

$$ X = \left\{ X_1, ... , X_n \right\} $$ where: $$ X_i \sim \mathcal{N}(x_i \mid \mu_i,\sigma_i^2) $$

I seek an expression (that can be evaluated) for the maximum difference between any pair of X_i.

Of course there are ${}^nC_2$ such pairs or:

$$ \frac{n!}{2(n-2)!} $$

and the difference between any pair can be written:

$$ \Delta X = X_j - X_k $$

where $j \ne k$ and $1 \le j \le n$ and $1 \le k \le n$.

Where $\Delta X$ is conveniently itself a Gaussian random variable of form:

$$ \Delta X_i \sim \mathcal{N}(x \mid \mu_j - \mu_k,\sigma_j^2 + \sigma_k^2) \mid \mu_j \ge \mu_k $$

Where the condition is added simply to order $j$ and $k$ in any given pair (without it $\Delta X $ for a pair has no singular definition as it could just as well be defined with $\mu_k - \mu_j$).

The expression I seek, both to describe (express) and to evaluate in the context of a probability is the the maximum $\Delta X$ for the set $\left\{ X_1, ... , X_n \right\}$. I'm a tad stuck on both elegant expression of that and evaluation of it as a probability.

Going backwards if I can express:

$$ \Pr(\Delta X_i \le \epsilon) = \Phi\left(\frac {\epsilon-\mu_\Delta} {\sigma_\Delta}\right) - \Phi\left(\frac {-\epsilon-\mu_\Delta} {\sigma_\Delta}\right) $$

where $\Phi$ is the Normal CDF and:

$$ \begin{align*} \mu_\Delta &= \mu_j-\mu_k\\ \sigma_\Delta^2 &= \sigma_j^2+\sigma_k^2\\ \end{align*} $$

And I could, clumsily perhaps express:

$$ \Delta X = \left\{ \Delta X_1, ... ,\Delta X_m \right\}\\ $$

where $m = {}^nC_2$, and I ask what is:

$$ \Pr\left(\Delta X_i \le \epsilon \, \forall i \in \left\{1,..,m\right\}\right) $$

Can such a probability be written as a combination of the probabilities $\Pr(\Delta X_i \le \epsilon)$ or can new random variable be defined as some linear function of the $\Delta X_i$ that thus retains the Gaussian distribution and for which the probability $\Pr(X_{new} \le \epsilon)$ could be written as above (the difference between two Normal CDFs)

Addendum

It happens, that various authors offer a nomenclature for describing the ordering of a set such $\Delta X$ above. The ordered vector can be written:

$$ \Delta X = \left[ \Delta X_{(1)}, ... ,\Delta X_{(m)} \right]\\ $$

where:

$$ \begin{align*} \Delta X_{(1)} &= \operatorname{min}(\Delta X)\\ \Delta X_{(m)} &= \operatorname{max}(\Delta X)\\ \Delta X_{(i)} &\le \Delta X_(i+1)\\ \end{align*} $$

Which helps a little in expressing the goal here succinctly. That is, I am after:

$$ \Pr(\Delta X_{m} \le \epsilon) $$

And it follows from the definition of $\Delta X_{(m)}$ that if it is less than $\epsilon$ that all the $\Delta X_i$ are.

What remains is to find a solution for that probability!

The crux of the problem of course is that $\Delta X_{m}$ is not easily defined, because all the $\Delta_i$ have distinct mean and variance and hence $\Delta X_{m}$ is not simply that with the largest mean ($\Delta X_i \mid \mu_i = \mu_{(m)}$).

There remains the hope, that there is a joint distribution that we can find which defined $\Delta X_{m}$ effectively.

Addendum II

I suspect that the field of Order Statistics:

https://en.wikipedia.org/wiki/Order_statistic

is very relevant here. In fact in terms of order statistics if we take the original set:

$$ X = \left\{ X_1, ... , X_n \right\} $$

The order vector becomes:

$$ X = \left[ X_{(1)}, ... ,X_{(n)} \right]\\ $$

And we can define:

$$ {\rm Range}\{\,X_1,\ldots,X_n\,\} = X_{(n)}-X_{(1)} $$

and are asking for $\Pr(X_{(n)}-X_{(1)} \le \epsilon)$.

The challenge remaining is that the elements of X are not IID (Independent and identically distributed), merely Independent and Normally Distributed (perhaps I can coin IND) and there is a suggestion that the Bapat–Beg theorem applies:

https://en.wikipedia.org/wiki/Bapat%E2%80%93Beg_theorem

thought the puzzle remains, how? How that theorem can be applied to yield $\Pr(X_{(n)}-X_{(1)} \le \epsilon)$.

1

There are 1 best solutions below

10
On

The maximum difference between the $X_i$'s would be the largest minus the smallest. In other words, we can use order statistics. If I understand your question correctly, you want the probability distribution for $X_{(n)}-X_{(1)}$, where $X_{(1)}$ is the minimum of all the $X_i$'s, and $X_{(n)}$ is the largest.

$$f_{X_{(i)}, X_{(j)}}(x_i, x_j) = \frac{n!}{(i-1)!(j-i-1)!(n-j)!}\big[F(x_i)\big]^{i-1} \cdot [F(x_j)-F(x_i)]^{j-i-1}\cdot[1-F(x_j)]^{n-j}f(x_i)f(x_j)$$ when $i<j$, for all $x_i < x_j$.

Ross then writes the case where $i=1$ and $j=n$:

$$f_{X_{(1)}, X_{(n)}}(x_1, x_n) = \frac{n!}{(n-2)!(n-1)!} \cdot [F(x_n)-F(x_1)]^{n-2}\cdot[1-F(x_n)]^{n-1}f(x_1)f(x_n)$$

From there it is messy integration to find the density of $X_{(n)} - X_{(1)}$.