estimate support (maximum) of distribution from sample

380 Views Asked by At

Let $X$ be a random variable. The only information we have about $X$ is that $X \leq M$ for some $M \in \mathbb R$. ($M$ is unknown.) We also have a random sample $X_1, \dots, X_n$ from $X$. I'd like to get an estimate $\hat M_n(X_1, \dots, X_n)$ of $M$ with a tail bound $$ \mathbb P (M - \hat M_n > t) < \delta_n(t), \qquad t \in \mathbb R. $$ (As an analogy, if we instead wanted to estimate the mean $\mu$ of $X$, we could use the estimate $\hat \mu_n(X_1, \dots, X_n) = \tfrac{1}{n}(X_1 + \cdots + X_n)$ with tail bound $\mathbb P(\hat \mu_n - \mu > t) < \delta_n(t)$ given by, e.g., by Markov, Chebyshev, or Chernoff (with varying degrees of required additional information on $X$).)

I've looked at the german tank problem and extreme value theory, but the approaches there need additional unavailable information about $X$.

1

There are 1 best solutions below

1
On

If you get an observed sample as: $1,9,2,7,4,3,1,2$. What can you say about $M$? You can atleast say $M\ge 9$ where $9$ is the largest observation in your sample and if there is no additional information this is your best bet.

So, it is sensible to estimate $M$ by $X_{(n)}$ the largest ordered statistic in iid sample of $X_1,\ldots, X_n$. The tail probabilities can be computed as follows

$$\begin{aligned} P(M-X_{(n)}>t) & =P(X_{(n)}<M-t) \\ & =P(X_1<M-t)P(X_2<M-t)\cdots P(X_n<M-t) \\ & =P(X_1<M-t)^n \le P(X_1\le M-t)^n=F(M-t)^n\end{aligned}$$ where $F$ is the distribution function. If $M$ is the largest point in the support then $F(M-t)<1$. Hence the tail is decays exponentially fast. That's the best you can say.