Find the MLE of $$f(x;\theta)= \theta^{-1}_2e^\frac{-(x-\theta_1)}{\theta_2}$$ for $x>\theta_1, \theta_2>0$ Likelihood:
\begin{align} L(\theta) & =L(\theta\mid X_1, X_2,\ldots, X_n) \\[6pt] & = \prod^n_{i=1}\frac{e^{\frac{-(x_i-\theta_1)}{\theta_2}}}{\theta_2} \end{align}
Here I factored out $\theta_2$ as well as turned to product into a summation to get: $$=\frac{1}{\theta^n_2} e^{\sum^n_{i=1}\frac{-(x_i-\theta_1)}{\theta_2}}$$Then I worked on solving the log-likelihood function:
\begin{align} \ell(\theta) & =\ln(L(\theta)) \\[10pt] & = \ln\left(\frac{1}{\theta^n_2} e^{\sum^n_{i=1}\frac{-(x_i-\theta_1)}{\theta_2}}\right) \\[10pt] & =\ln(\theta_2^{-n})+\sum_{i=1}^n- \left(\frac{x_i-\theta_1}{\theta_2}\right) \\[10pt] & =-n\ln(\theta_2)-\frac{\sum_{i=1}^nx_i-n\theta_1}{\theta_2} \\[10pt] & =-n\ln(\theta_2)-\frac{\sum^n_{i=1}x_i}{\theta_2}+\frac{n\theta_1}{\theta_2} \end{align}
Now here is where I get stuck when I take the partial derivatives with respect to $\theta_1$ I clearly run into the problem that setting that equal to zero is not dependent on $\theta_1$ I stopped here and wanted to ask for advice before trying to find the MLE of $\theta_2$ This is also my first time posting and I hope that my question is thoroughly stated. Thank you.
I'll begin by asking you a question: If I asked you for the maximum value of $f(x) = x^2$ for all real values $x$, what would you say? Would you calculate $f'(x) = 2x$ and solve for the critical points, and conclude that the maximum value is attained at $x = 0$? Of course not. You would immediately recognize that this function has no global maximum on the real numbers.
I have said this before in another post. Just because we can use the tools of calculus to find the extrema of functions, does not mean that this is the only tool, or even the most appropriate tool to do so. You have to think about what you are trying to do, rather than mechanically applying rules and theorems.
Now suppose I asked for the maximum value of $f(x) = x^2$ for $x \in [-1, 1]$. Now what would you conclude? You would correctly conclude that the maximum value is $1$, attained when $x = 1$ or $x = -1$. You can use calculus to justify this conclusion, but ultimately there is something more going on here when the domain is restricted in some fashion.
The same idea applies to your calculation of the maximum likelihood estimate. The joint probability of the sample is zero whenever the sample contains an observation that is impossible to obtain. For example, if I say that a sample is drawn from a uniform distribution on $[0,1]$, then I cannot obtain $2$ as part of that sample. It seems like an obvious thing to say, but the consequences of that rather obvious statement is that this property can be informative of the parameter when we are given a sample drawn from a distribution with unknown parameters.
What I mean by this should be clear by modifying the previous example. Suppose I say that I drew a sample from a uniform distribution on $[0,\theta]$, where $\theta$ is not known. But I saw the sample $$\{5, 2, 3, 9, 1, 5, 7\}.$$ Then we can see that it would be impossible to have observed this sample had $\theta$ been any value less than $9$, since we did observe such a value.
So, when we talk about parametric inference based on maximizing a likelihood function given some random sample, we have to be mindful of information from the sample that could also restrict the possible values of the parameter(s) of the distribution from which the sample was apparently drawn.
With this principle applied to your particular example, the first thing we need to do is be clear about the support of the distribution you stated: if $$f_X(x \mid \theta_1, \theta_2) = \theta_2^{-1} e^{-(x-\theta_1)/\theta_2},$$ this doesn't explicitly tell us the support of the random variable $X$. In fact, it should be $$X \in [\theta_1, \infty),$$ and $X$ is what we call a shifted exponential distribution or location-scale exponential distribution. It is not hard to see that we can generate such a density by noting that $$X = \theta_2 Y + \theta_1,$$ where $Y \sim \operatorname{Exponential}(1)$ is an exponential distribution with mean $1$. Therefore, the joint distribution of a sample given $\theta_1, \theta_2$ is $$f_{\boldsymbol X}(\boldsymbol x \mid \theta_1, \theta_2) = \prod_{i=1}^n \theta_2^{-1} e^{-(x_i - \theta_1)/\theta_2} \mathbb 1 (x_i \ge \theta_1),$$ where $\mathbb 1(x_i \ge \theta_1)$ is an indicator function that mathematically expresses the concept that the density is zero if the condition in parentheses is not satisfied. Formally, $$\mathbb 1 (x_i \ge \theta_1) = \begin{cases} 1, & x \ge \theta_1, \\ 0, & \text{otherwise}. \end{cases}$$ So with this in mind, the joint likelihood given the sample, is $$\mathcal L(\theta_1, \theta_2 \mid \boldsymbol x) = \theta_2^{-n} \exp\left(-\frac{1}{\theta_2} \sum_{i=1}^n (x_i - \theta_1)\right) \mathbb 1 (x_{(1)} \ge \theta_1),$$ where $x_{(1)} = \min_i x_i$ is the first order statistic, or the smallest of the observations from $\boldsymbol x = (x_1, x_2, \ldots, x_n)$. This is because the product of all of the $\mathbb 1 (x_i \ge \theta_1)$ terms is $1$ if and only if each observation is at least as large as $\theta_1$, which is the same as saying the smallest observation is at least as large as $\theta_1$.
Finally, we can simplify the likelihood a bit more by expressing it in terms of the sample mean $\bar x$: $$\mathcal L (\theta_1, \theta_2 \mid \boldsymbol x) = \theta_2^{-n} e^{-n(\bar x - \theta_1)/\theta_2} \mathbb 1(x_{(1)} \ge \theta_1).$$ This suggests to us that a sufficient statistic for $\theta_1, \theta_2$ would be the ordered pair $(x_{(1)}, \bar x)$. We need to know both of these values to estimate $\theta_1, \theta_2$. And this is something that should make intuitive sense: a "good" estimate of $\theta_1$ should be related to the minimum observation in the sample, but the sample mean would not be particularly informative for this parameter. A "good" estimate of $\theta_2$ should contain information about the sample mean.
Now, the rest should be something you can reason through, and possibly use calculus. But keep in mind that just because you calculated the derivative of the log-likelihood, doesn't mean that the maximum likelihood must occur at a critical point; it could occur at a boundary point. This was illustrated at the very beginning of my answer.