My question is part of George Casella textbook stat inference Example 8.3.7 (size of LRT). In this example, we have $$P_{\theta_0}(X_{(1)}\geq c)=e^{-n(c-\theta_0)}=\alpha$$. Since under $H_0:\theta \leq \theta_0$. Then in order to get $sup\beta(\theta)$ under $H_0$, I find the above expression is an increasing function in terms of $\theta$. Hence the supremum is reached at when $\theta = \theta_0$.
But in the textbook, the logic is "Since $\theta$ is a location parameter for $X_{(1)}$", then they get $$P_{\theta}(X_{(1)}\geq c)= P_{\theta_0}(X_{(1)}\geq c)$$.
My question is what the logic of the textbook is: "Since $\theta$ is a location parameter for $X_{(1)}$". My logic is to observe that the expression is an increasing function in terms of $\theta$. Yes, I get the cdf of $X_{(1)}$. And I know $\theta$ is a location parameter. But how this can help me to find the maximum under $H_0$?
For more detail, I know this: $$P_{\theta_0}(X_{(1)}\geq c)=P_{\theta_0}(minX_i\geq c)=\prod_{i=1}^n P_{\theta_0}(X_{(1)}\geq c)= \prod_{i=1}^n e^{-(c-\theta_0)}= e^{-n(c-\theta_0)}=\alpha$$ But I don't use the knowledge of $\theta$ is a location parameter.
What you wrote isn't quite what the text says. It actually reads:
First, your interpretation of $P_{\theta_0}[X_{(1)} \ge c] = e^{-n(c-\theta_0)}$ is problematic because you proceed to treat this expression as a function of $\theta$, when it is not. It is a function of $\theta_0$. I understand what you intend to say, but you are not saying it in a mathematically precise way.
Let's try to look at the big picture first. The hypothesis $$H_0 : \theta \le \theta_0 \quad \text{vs.} \quad H_1 : \theta > \theta_0$$ contains many scenarios for the true value of $\theta$ that makes $H_0$ true. But because the test is designed to detect when $\theta$ is "too large"--that is to say, the test rejects the null in favor of the alternative when the sample contains sufficient evidence to suggest it is highly implausible it could be generated from a distribution whose parameter $\theta$ is at most $\theta_0$--from an intuitive standpoint, it makes sense that the test is less able to discriminate between the two hypotheses as $\theta$ approaches $\theta_0$.
What this means is that if the true value of $\theta$ is very, very small, much smaller than $\theta_0$, the chance of Type I error can be much lower than $\alpha$, because the chance that the sample minimum would fall into the rejection region is lower than if $\theta$ were closer to $\theta_0$. For instance, suppose $\theta_0 = 2$, $n = 7$, and $\alpha = 0.05$. Then $c \approx 2.42796$. If I now generate $7$ realizations from an exponential distribution with location parameter $\theta = 1.9 < \theta_0$ and mean $\mu = \theta + 1 = 2.9$, I might get $$\{2.5529, 2.12674, 3.29505, 2.33452, 2.65148, 5.03801, 2.16436\}.$$ The sample minimum is $X_{(1)} = 2.12674$ and I correctly fail to reject $H_0$. But I might also get $$\{4.04183, 4.4109, 2.51703, 2.49925, 2.71013, 4.2631, 2.88076\}$$ and $X_{(1)} = 2.49925 > c$, and I have now rejected $H_0$ in error. But if $\theta = -5$, I might get $$\{-4.56291, -3.72484, -4.46664, -4.90246, -2.86634, -4.4378, -4.83439 \}$$ and now nothing is even positive, let alone greater than $c$. You can see that in such an extreme case, the chance of getting $X_{(1)} > c$ is incredibly tiny.
So what we are saying here is that, as a function of $\theta$ (not $\theta_0$), the probability $$P_\theta [X_{(1)} \ge c] = e^{-n(c - \theta)}$$ is strictly bounded above by $$P_{\theta_0} [x_{(1)} \ge c] = e^{-n(c - \theta_0)}.$$ The Type I error of rejecting the null is maximized when the true value of the parameter $\theta$ is at the boundary of the null hypothesis, i.e., at $\theta_0$. This is what you are trying to say, and it's the same thing as what Casella is saying.
Why then does Casella refer to "$\theta$ is a location parameter for $X_{(1)}$?" What does he mean by this? He is saying that the distribution of the sample minimum has the same location parameter as the distribution for any individual observation. For this reason, $P_\theta[X_{(1)} \ge c]$ is the same as $e^{-n(c - \theta_0)}$ except with $\theta_0$ replaced by $\theta$, and it is not difficult to see why this is the case, which is why he doesn't elaborate further. It's not the most clearly stated line of reasoning, but in any case, what matters is the underlying logic, which is illustrated in my example above.