MLE for uniform distribution

382 Views Asked by At

Suppose we have a sample of $X_i = x_i$ for $i \in [1,n]$ where $\forall X_i$ they are iid with uniform distribution on the interval $[a,b]$

Now we want to find the MLE for the uniform distrubtion. We form the likelihood function:

We get that $ L(a;b; x_1,....,x_n) = \Pi_{i=1}^n p(x_i) = \Pi_{i=1}^n \frac{1}{b-a} = \frac{1}{(b-a)^n}$

Taking the natural logaritm, we get the log-likelihood function as:

$$ l(a;b; x_1,....,x_n) = -n \ln(b-a)$$

We take the partial derivative wrt to $a$ and $b$ respectively, which yields us that:

$$ \frac{\partial l}{\partial a} = n / (b-a) > 0$$

which then is monotonically increasing. Notice that $a\leq \min_{i\in[1,n]}x_i \leq \max_{i\in[1,n]}x_i\leq b$

So if we want to maximize $l$ as a function of $a$, we have to choose the minimum value from our sample. But I don't get the reasoning. I understand that it's the most logical answer, to pick the minumum, but mathematically I can't get the hang of it. Since it's monotonically increasing wrt to $a$ why dont we pick a larger $x_i$ then? I'm new to this concept to some clarification would be very appreciated.

Thanks.

2

There are 2 best solutions below

0
On BEST ANSWER

The issue is that you are not taking into account the support. The density of a continuous uniform random variable $X$ on the interval $[a,b]$ is actually $$f_X(x) = \frac{1}{b-a} \mathbb{I} (a \le x \le b) = \begin{cases} \frac{1}{b-a}, & a \le x \le b \\ 0, & \text{otherwise}. \end{cases}$$

Consequently, the likelihood of an IID sample $\boldsymbol x = (x_1, x_2, \ldots, x_n)$ for unknown parameters $a, b$ will be

$$\mathcal L(a, b \mid \boldsymbol x) = \prod_{i=1}^n f_{X_i}(x_i) = \frac{1}{(b-a)^n} \mathbb{I} (a \le x_1, x_2, \ldots, x_n \le b).$$ Note how the use of the indicator function $\mathbb 1$ changes the likelihood: we have now made it explicit that the likelihood is zero that any observations $x_i$ will be outside the interval $[a,b]$.

So now if we attempt to maximize this likelihood, we must take this additional restriction into account. But before we do so, we can simplify the indicator function by noting that if $a \le x_1, x_2, \ldots, x_n \le b$, that is to say, every observation must be between $a$ and $b$, this is equivalent to saying that the smallest observation $x_{(1)} = \min_i x_i$ must not be smaller than $a$, and the largest observation $x_{(n)} = \max_i x_i$ not be larger than $b$. So we may write our likelihood as $$\mathcal L(a, b \mid \boldsymbol x) = (b-a)^{-n} \mathbb{I} (x_{(1)} \ge a) \mathbb{I} (x_{(n)} \le b).$$ You may ask, for instance, what happens if $x_{(1)} > b$. That will still satisfy the first indicator, but it won't satisfy the second, since $x_{(1)} \le x_{(n)}$. Similarly, if $x_{(n)} < a$, then the second indicator is satisfied but not the first.

With this characterization of the likelihood, it is now very easy to see the MLE. For a fixed $b$ and sample $\boldsymbol x$, the choice of $a$ that maximizes $\mathcal L$ will be the one that makes $a$ as large as possible without violating the condition $x_{(1)} \ge a$, since $(b-a)^{-n}$ is an increasing function of $a$. Similarly, for a fixed $a$ and sample $\boldsymbol x$, the choice of $b$ that maximizes $\mathcal L$ will be the one that makes $b$ as small as possible without violating the condition $x_{(n)} \le b$, since $(b-a)^{-n}$ is a decreasing function of $b$. So our (joint) MLE will be $$\hat a = x_{(1)} = \min_i x_i, \quad \hat b = x_{(n)} = \max_i x_i.$$

0
On

This is one of those "trick" MLE problems because it's not really a calculus problem, in the sense of requiring you to differentiate, set equal to zero and solve.

Informally, it goes like this: your likelihood $L$ is a function of $a$ and $b$. We know that $a < b$. In order to make the likelihood as big as possible, we'd like to make the denominator as small as possible. How can we do this? By putting $a$ and $b$ as close together as possible. But note that we cannot put $\hat a > \min \{x_1, \dots, x_n\}$, and we cannot put $\hat b < \max \{x_1, \dots, x_n\}$, because this would lead to completely idiotic guesses for the parameter values. For example, if your minimum sample value was $-0.2$ and your maximum was $1.4$, how could your sample have come from a $U(0, 1)$ distribution? It's impossible.

Regarding your last paragraph, the point is that it makes no sense at all to make $a$ bigger than the minimum. It's literally not possible for the minimum parameter to be bigger than a sample you actually saw.