Finding MLE of uniform distribution with actual example values

665 Views Asked by At

I'm watching this video and going to part I am stuck at here https://youtu.be/XaAtkCzdjLE?t=6m2s

Following the example in the video, I assume that $\theta$ will be between $14$ and $501$.

Now I don't understand the math behind maximizing Likelihood because if we did want to maximize it, then it would be $\frac{1}{14^5}$ Given that the function is $\frac{1}{\theta^5}$ in our example, we want the smallest $\theta$ which in this case would be $14$ out of the choice of $\{342, 14, 68, 501, 392\}$.

$\frac{1}{14^5}$ is the highest compared to $\frac{1}{501^5}$1 which is the smallest.

I've seen many graphs like the one below,

https://i.stack.imgur.com/FrSTF.png

but I don't get why it suddenly jumps at max $x_i$. What about the values smaller than $\max(x_i)$ ?

I know the correct answer is $501$, but I am unable to show it.

1

There are 1 best solutions below

6
On BEST ANSWER

Understanding the Parameter $N$

The example in the video considers the problem of finding the maximum likelihood estimator of a discrete uniform distribution with parameter $N$. To clarify the role $N$ plays in the distribution, consider the sample space of the distribution: The sample space of a discrete uniform distribution with parameter $N$ is the set $$\{ x \in \mathbb{N} : 1 \leq x \leq N\}.$$ Therefore, $N$ is an upper bound on observations from the distribution. If $x$ is an observation from the distribution, then $x \leq N$. In other words, $N \geq x$ for every observation $x$. This second inequality will be of use later in "updating" the parameter space of $N$.

The Parameter Space of $N$

Before determining what value of $N$ is most likely, we should consider what values of $N$ are possible. The set of possible values of $N$ is called the parameter space of $N$. The parameter space of $N$ is the set $$\{n \in \mathbb{N} : n \geq 1\}.$$ As the parameter of a discrete uniform distribution, $N$ could be any natural number greater than or equal to $1$ but could not be, e.g., an irrational number. We will "update" the parameter space of $N$ soon.

Updating the Parameter Space

The example in the video assumes the serial numbers of the tanks are observations from a discrete uniform distribution with parameter $N$. In this context, $N$ is an upper bound on the serial numbers; i.e., $N$ is the largest possible serial number.

Consider the sample of serial numbers $(342,14,68,501,392)$. This sample reveals information about the parameter $N$ through the relationship discussed above: $N \geq x$ for every observation $x$. From the given sample, $$N \geq 342,\quad N \geq 14,\quad N \geq 68,\quad N \geq 501,\quad N \geq 392.$$ Those five inequalities can be rewritten as $$N \geq 501 > 392 > 342 > 68 > 14.$$ However, the inequality $N \geq 501$ alone captures all the relevant information (and no more than the relevant information). In the context of the example, the largest possible serial number is at least $501$. (If the largest possible serial number were less than $501$, then how was a serial number of $501$ observed? Absurd!)

To summarize, from the given sample of serial numbers $(342,14,68,501,392)$, we now know $N \geq 501$. Recall the parameter space of $N$ was the set $\{ n \in \mathbb{N} : n \geq 1 \}$. Now, we update the parameter space of $N$ to the set $$\{ n \in \mathbb{N} : n \geq 501 \}.$$

The image you linked exemplifies the restriction $N \geq 501$ by graphing the function $L(n) = \frac{1}{N^5}$ on the restricted domain $501 \leq N < +\infty$. The restriction on the domain explains the discontinuity at $N = 501$ and explains why values of $N$ less than $501$ are not considered.

To bring the first half of the discussion to an end, consider the role $501$ has played. There wasn't anything special about the number; it happened to be the largest observation in the sample. We could have generalized the discussion by writing $501$ as $\max x_i$. The second half of the discussion follows and can be read without reference to the first half.


The Likelihood Function

Before we introduce the concept of a likelihood function, let us denote the random vector in the example by $\mathbf{X}$. Then, $$\Pr(\mathbf{X} = (342,14,68,501,392))$$ is the probability of observing the sequence of serial numbers $(342,14,68,501,392)$ on five captured tanks. In general, $$\Pr(\mathbf{X} = \mathbf{x})$$ is the probability of observing the sequence of serial numbers $\mathbf{x} = (x_1, \ldots, x_5)$ on five captured tanks.

To introduce the likelihood function, consider two possible values of $N$, $n_1$ and $n_2$. If $$\Pr(\mathbf{X} = \mathbf{x} \mid N = n_1) > \Pr(\mathbf{X} = \mathbf{x} \mid N = n_2),$$ then the sample $\mathbf{x}$ is more likely to have occured if $N = n_1$ than if $N = n_2$; in this sense, $n_1$ is a more plausible value for the true value of $N$ than is $n_2$.

We denote the likelihood function by $L(n \mid \mathbf{x})$ and define it by $$L(n \mid \mathbf{x}) = \Pr(X = \mathbf{x} \mid N = n).$$

I would like to take a moment and comment on the notation used in the article to which you linked in one of your comments. In denoting the likelihood function, the article omits the sample $\mathbf{x}$, writing, for example, $L(\theta)$ instead of $L(\theta \mid \mathbf{x})$. The likelihood function depends on the sample $\mathbf{x}$; i.e., a given value of the parameter may be more plausible as the true value of the parameter for one sample than for another. Therefore, I find it best practice to explicitly note the dependence by writing $L(\theta \mid \mathbf{x})$.

Returning to the main discussion, let $\hat{n} = \mathop{\mathrm{arg\,max}} L(n \mid \mathbf{x})$. In other words, $\hat{n}$ is the value at which $L(n \mid \mathbf{x})$ is maximum. In the sense previously discussed, $\hat{n}$ is the most plausible value for the parameter $N$. This is the essence of the method of maximum-likelihood estimation.

Finding the Likelihood Function

Recall the definition of the likelihood function $L(n \mid \mathbf{x})$: $$L(n \mid \mathbf{x}) = \Pr(\mathbf{X} = \mathbf{x} \mid N = n).$$ If $N = n$ is less than any $x_i \in \mathbf{x}$, then the probability of observing $\mathbf{x}$ is $0$ (because any observation $x_i$ from a discrete uniform distribution with parameter $N$ must satisfy $1 \leq x_i \leq N$). For example, if $N = 31$, then the probability of observing the sample $\mathbf{x} = (4,15,9,26,53)$ is $0$ because the observation $x_5 = 53$ could not have come from a discrete uniform distribution with parameter $N = 31$. Therefore, if $N < \max x_i$, $$\Pr(\mathbf{X} = \mathbf{x} \mid N = n) = 0.$$

If $N \geq \max x_i$, then $$\Pr(\mathbf{X} = \mathbf{x} \mid N = n) = \frac{1}{N^5}.$$

To summarize, $$\begin{align*}L(n \mid \mathbf{x}) & = \Pr(\mathbf{X} = \mathbf{x} \mid N = n) \\ & = \begin{cases}0 & \text{if } N < \max x_i, \\ \frac{1}{N^5} & \text{if } N \geq \max x_i.\end{cases}\end{align*}$$

Maximizing the Likelihood Function

The above function is the same as the one whose graph is shown in the image to which you linked in your question. As can be seen from the formula for the function, for values below $\max x_i$, the function is identically zero, in agreement with the graph. Similarly, as can be seen from the formula for the function, at $\max x_i$, the function jumps to $\frac{1}{(\max x_i)^5}$, in agreement with the graph.

For $\mathbf{x} = (342,14,68,501,392)$, $\max x_i = 501$ and the graph jumps from $0$ to $\frac{1}{501^5}$ at $n = 501$, falling to $\frac{1}{502^5}$ at $n = 502$, and continuing to fall thereafter.

The graph of $L(n \mid \mathbf{x}) = \begin{cases}0 & \text{if } N < \max x_i, \\ \frac{1}{N^5} & \text{if } N \geq \max x_i\end{cases}$ clearly shows the maximum is at $N = \max x_i$. (If you aren't satisfied by the graph, a confirmation by analytic means is a routine exercise in calculus.) This remarkable result is true for all discrete uniform distributions with parameter $N$. The method by which we arrived at that result, the method of maximum-likelihood estimation, is applicable in estimating the parameters of other distributions.


I hope I have answered your question. If you would like to me elaborate on any section, please let me know. I would be happy to do so!