I am trying to solve the following exercise:
Let $X_1$, ... , $X_2$ be a sample from the distribution whose probability density function is:
$f(x)=$ $\frac{1}{3}*e^{-\frac{1}{3}*(x-\theta),}\atop{0,}$ $x\geq\theta\atop{x<\theta}$
Find the maximum likelihood estimator of $\theta$
Following is my attempt to find the MLE (I could not get the image to work):
https://i.stack.imgur.com/7CZuy.jpg
I use this guy's method: https://www.youtube.com/watch?v=Z582V53dfr8
I get an answer depending on $n$ which does not make sense to me. The answer my teacher gave was $\min(x_i)$, but his explanation used $\arg$ which I do not understand.
Am I using a wrong method? Have I simply made an error somewhere? Do you know another method I can use?
Edit: As far as I can see, no matter how I do the logarithmic operations, i will always only get an expression of $n$ after differentiating.
Differentiating is not the correct approach here. The problem is that each $f(x_i; \theta)$ depends discontinuously on $\theta$: they jump from $1/3$ when $\theta$ equals $x_i$ to $0$ when $\theta$ is greater than $x_i$. The likelihood function $L(x_1, \dots, x_n ; \theta)$, being a product of the $f(x_i; \theta)$'s, is not a differentiable function of $\theta$ either.
Instead, let's try to solve this using common sense! The question we should ask ourselves is: which value of $\theta$ maximises the likelihood of observing the values $x_1, \dots, x_n$ that we obtained in the sample? In other words, which value of $\theta$ maximises $L(x_1, \dots, x_n; \theta) = f(x_1;\theta) \times \dots \times f(x_n; \theta)$?
First of all, we should notice that the "best" value of theta cannot be greater than any of the $x_i$'s. If $\theta$ is greater than a particular $x_i$, then $f(x_i, \theta)$ would be zero, so the likelihood $L(x_1, \dots, x_n; \theta)$ would also be zero, hence this value of $\theta$ can't give the highest possible likelihood.
On the other hand, suppose that there is some "breathing space" between your choice of $\theta$ and the smallest of the $x_i$'s. This value of $\theta$ can't be the best choice either. If you increase $\theta$ by a tiny amount, then the new $\theta$ will still be smaller than all of the $x_i$'s, and each of the $f(x_i; \theta)$'s, and hence also $L(x_1, \dots, x_n; \theta)$, will be greater than what they were originally, hence the original $\theta$ couldn't have given the highest possible likelihood.
The conclusion is that the value of $\theta$ that gives the largest likelihood $L(x_1, \dots, x_n; \theta)$ is precisely equal to the smallest of the $x_i$'s. And that is the answer given by your teacher.