Expectation of the largest order statistic from uniform random variables

554 Views Asked by At

If $X_1, ..., X_n$ are iid random variables from the Uniform[$0,\theta$] distribution, where $\theta >0$, compute the expectation of the largest order statistic denoted $X_{(n)}$.

I am looking to test whether or not this statistic is an biased or unbiased estimator for $\theta$, however I am struggling to test the bias as I am unable to compute its expectation.

My initial thoughts with this question were that $E_{\theta}(X_{(n)})=\frac{\theta}2$ since this should be the expectation of any random variable from the uniform distribution on this interval. However, I can see that clearly this won't be the case in this situation since the expectation must (intuitively) depend upon $n$ in some way.

I am wondering what the problem is with my initial thoughts.

Edit: I have been informed in the comments of what the correct approach is, however, I am still unclear of the problem with my reasoning that $E_{\theta}(X_{(n)})=\frac{\theta}2$ since $E_{\theta}(X_i)=\frac{\theta}2$ for all possible values of $i$. By definition, there exists some natural number $j$ such that $X_j=X_{(n)}$ so why is it that $X_j$ doesn't follow the uniform distribution when every $X_i$ does.

1

There are 1 best solutions below

1
On BEST ANSWER

Suppose we have two random variables $X_1$ and $X_2$, uniformly distributed in $[0,1]$ and independent of each other.

Let $X_{(2)} = \max(X_1,X_2)$. Someone claims that $X_{(2)}$ is either equal to $X_1$ or $X_2$ so it should have the same distribution as either of $X_1$ or $X_2$, that is, uniform in [0,1].

This is not correct. Let $J$ be the index of the maximum, that is, $J = 1$ if $X_1 > X_2$ and $J=2$ if $X_2 > X_1$. If $X_1 = X_2$, we can define $J$ arbitrarily, so let $J = 1$ in that case, that is, \begin{align*} J = \begin{cases} 1 & X_1 \ge X_2 \\ 2 & X_1 < X_2 \end{cases} \end{align*} We have $X_{(2)} = X_J$. The difficulty is that $J$ is dependent on $(X_1,X_2)$. Let us compute $\mathbb E X_{(2)} $, by first conditioning on $J$. We have \begin{align*} \mathbb E ( X_{(2)} \,|\, J = 1) &= \mathbb E ( X_1 \,|\, J= 1) \\ &= \mathbb E ( X_1 \,|\, X_1 \ge X_2). \end{align*} Unconditionally, $X_1$ is uniformly distributed on $[0,1]$. But conditional on $X_1 \ge X_2$ its distribution changes. The easiest way to see this is to note that the joint distribution of $(X_1,X_2)$ conditional on $X_1 \ge X_2$ is uniformly distributed on the triangle in $\mathbb R^2$ with vertices $(0,0), (1,0)$ and $(1,1)$. This is the restriction of the uniform distribution on the $[0,1]^2$ to the set $T := \{(x_1,x_2): x_1 \ge x_2\}$.

This gives \begin{align*} \mathbb E [X_1 \mid X_1 \ge X_2] &= \frac1{\text{area}(T)}\int_{T} x_1 \, dx_1 dx_2 \\ &= \frac1{1/2} \int_0^1 \int_{x_2}^1 x_1 dx_1 dx_2 = \int_0^1 (1-x_2^2) dx_2 = 1 - \frac13 = \frac23. \end{align*} Note that this is bigger than $1/2$. By symmetry we also have $\mathbb E[X_2\mid X_2 > X_1] = 2/3$. Then, $$ \mathbb E[X_{(2)}] = \frac12 \mathbb E[X_1 | X_1 \ge X_2] + \frac12 \mathbb E [X_2 | X_2 > X_1] = \frac23. $$

TL;DR: The knowledge of the index $J$, that is, which of the variables is the maximum changes the joint distribution of $(X_1,X_2)$. In this case, once you know $J$, $X_1$ and $X_2$ are no longer independent! Once you say that I know the random variable $X_1$ is the maximum, you have changed the distribution of $(X_1,X_2)$. That is what conditioning does in general.