Intuitive approximation and Order Statistics

168 Views Asked by At

Given $x_1, x_2, \dots, x_n$ identically distributed random variables with probability density and cumulative distribution functions $f$ and $F$ respectively, the probability density function for the maximum is $$ f_{max} \big(x_{max} (x) \big) = \\ P ( X_j \in [x, x+ \epsilon] \land X_k \leq x \,\,\forall k \neq j ) \\ = n f(x) F(x)^{n-1}$$ The mean of the maxima will hence be given by $$ \int_{-\infty}^{\infty} nx f(x) F(x)^{n-1} \mathrm{d}x $$

Now, to the point: I have intuitively thought that a decent approximation of the mean of the maxima, $ \bar{x}_{max}$ (of its value, not its distribution that is) could be obtained by considering the solution to the equation $$f(\bar{x}_{max}) = \frac{1}{n}$$ An alternative, possibly more meanigful as commented by Jean Marie, is to consider the cumulative distribution instead, and impose the condition $$ F(\bar{x}_{max}) = 1- \frac{1}{n} $$ whose solution should provide an estimate for the mean of the maxima. The rest of the post will focus on this possibility.

To clarify the line of thought yielding my "guess", having $n$ samples to extract, one might be tempted to assume that even events, whose probability of the order $\frac{1}{n}$, are plausible. The farther from the mean, the less likely one event is (under reasonable conditions): hence I would expect the maximum to be related to the lowest probability value to be reasonaly expected.

So, the question is: how good is the approximation of saying that the mean of the random variable (maximum over each $n$ extractions), can be estimated by a value $\bar{x}_{max}$, such that $(1-\frac{1}{n})$ % of the samplings are less than it?

I did perform some numerical checks for the exponential distribution, supporting the idea the approximation could be reasonable.

I would like to investigate such approximation analytically: confirm is converges asymptotically to the mean for $n \to \infty$, and ideally get a meausure of the error involved for finite $n$.

In order to do so, I would nevertheless need to be able to compute the mean of the maxima $$ \int_{-\infty}^{\infty} x f(x) F(x)^{n-1} \mathrm{d}x $$ which I am unable to do even for the simple case of the exponential distribution, in which case the expression above specializes to $$ \int_{0}^{\infty} x \lambda e^{- \lambda x} (1 – e^{- \lambda x })^{n-1} \mathrm{d}x $$

As I am interested mainly in the case where $n$ is large, I thought I could make an attempt using Laplace’s method.

Using $$xf(x) =[e^ {ln \big(xf(x)\big )}] ^{\frac{n-1}{n-1}}$$ I tried to re-write the above in a form suitable for Laplace's method $$ \int_{-\infty}^{\infty} e^{(n-1) [\frac{1}{n-1}\ln (x f(x)) + \ln(F)]} $$ But I have not achieved much, as I would need to find stationary points of the function $$ \frac{1}{n-1}\ln (x f(x)) + \ln\big(F(x) \big) $$

Any comment on the intutive approach to estimate the mean of the maxima is welcome. Any hint or suggestions on ways to characterise the error involved would be very much appreciated as well.

Thanks in advance.

$\mathbf{EDIT \,\, Following \,Claude \,Leibovici's \, asnswer}$

In an answer below Claude Leibovici has calculated a closed form for the mean of the maxima, for an exponential distribution. His result states $$x_{max} = \frac{H_n}{\lambda} $$ This confirms the conjecture presented in the post, at least for the exponential distribution. Indeed, the conjecture states that an (asymptotic) approximation of the mean of the maxima, $\bar{x}_{max}$, can be estimated by the equation $$ F(\bar{x}_{max}) = 1 - \frac{1}{n}$$ which reads, specifically for the exponential distribution $$ 1 - e^{-\lambda \bar{x}_{max} } = 1 - \frac{1}{n}$$ Using the closed form value courtesy of Claude Leibovici one verifies it obeys the desired asymptotics $$ 1 - e^{-\lambda \frac{H_n}{\lambda}} \sim 1 - e^{-(\gamma + \ln n)} = \mathcal{O} ( 1 - \frac{1}{n})$$ which is encouraging.

The question whether this holds in general for an arbitary cdf $f$ is still open.

1

There are 1 best solutions below

5
On BEST ANSWER

Considering $$I_n=\int_{0}^{\infty} x \lambda e^{- \lambda x} (1 – e^{- \lambda x })^{n-1} \,dx=\frac 1 \lambda\int_{0}^{\infty} y\,e^{-y}(1-e^{-y})^{n-1}\,dy$$ The first thing to notice is that the antiderivative has a "closed" form expression $$\int y\,e^{-y}(1-e^{-y})^{n-1}\,dy=\frac{\left(1-e^{-y}\right)^n }{n^2 \left(1-e^y\right)^{n}} \, _2F_1\left(-n,-n;1-n;e^y\right)+\frac{y \left(1-e^{-y}\right)^n}{n}$$ from which the definite integrals can be "easily" computed.

The end result is surprizingly simple $$\color{red}{I_n=\frac{H_n}{\lambda n}}$$ provided $\Re(n)>-1$.

For large values of $n$, $$I_n=\frac 1\lambda \left(\frac{\gamma +\log \left({n}\right)}{n}+\frac{1}{2 n^2}-\frac{1}{12 n^3}+O\left(\frac{1}{n^4}\right) \right)$$