The likelihood ratio statistic for testing $H_0:\theta\in\Theta_0$ versus $H_1:\theta\in\Theta_0^c$ is usually defined by $$ \lambda(\mathbf x) =\frac{\sup_{\theta\in\Theta_0}L(\theta\mid\mathbf x)}{\sup_{\theta\in\Theta} L(\theta\mid\mathbf x)}, $$ where $\Theta$ is the whole parameter space and $\Theta_0\subset\Theta$ (see, for example, Secion 8.2.1 of Casella and Berger (2002)).
Why is the supremum in the denominator taken over all $\Theta$ instead of $\Theta_0^c$? Would it make sense to define the likelihood ratio statistic when the supremum in the denominator is taken over $\Theta_0^c$? If the supremum in the denominator was taken over $\Theta_0^c$, $\lambda(\mathbf x)$ would not necessarily satisfy $0\le\lambda(\textbf x)\le1$, but it seems that this choice would be more intuitive since we would compare the likelihood when $H_0$ is true with the likelihood when $H_1$ is true. Or are these two options ($\Theta$ and $\Theta_0^c$) actually equivalent?
Any help is much appreciated!
Suppose $\boldsymbol X=(X_1,\ldots,X_n)$ is a random vector whose distribution is parameterized by $\theta$, where $\theta\in \Theta\subseteq \mathbb R^p$. Let $L(\theta\mid \boldsymbol x)$ be the likelihood function given the sample $\boldsymbol x=(x_1,\ldots,x_n)$.
In general we consider the problem of testing the null $H_0:\theta\in \Theta_0$ against the alternative $H_1:\theta\in \Theta_1$, where $\Theta_0\subset \Theta$ and $\Theta_1\subseteq \Theta-\Theta_0$.
We prefer $H_0$ to $H_1$ ($H_1$ to $H_0$) if $$\sup_{\theta}\{L(\theta\mid \boldsymbol x):\theta\in\Theta_0\}>(<) \sup_{\theta}\{L(\theta\mid \boldsymbol x):\theta\in\Theta_1\}$$
When $H_0$ is true (false), the ratio $$r(\boldsymbol x)=\frac{\sup_{\theta\in\Theta_0}L(\theta\mid \boldsymbol x)}{\sup_{\theta\in\Theta_1}L(\theta\mid \boldsymbol x)}$$
is expected to be large (small). But $r$ is not bounded above.
So we modify $r(\boldsymbol x)$ by
$$\Lambda(\boldsymbol x)=\frac{\sup_{\theta\in\Theta_0}L(\theta\mid \boldsymbol x)}{\sup_{\theta\in \Theta_0 \cup \Theta_1}L(\theta\mid \boldsymbol x)}=\frac{\sup_{\theta\in\Theta_0}L(\theta\mid \boldsymbol x)}{\sup_{\theta\in \Theta}L(\theta\mid \boldsymbol x)}$$
If $H_0$ is true (false), then as before, $\Lambda$ is expected to be large (small).
However we now have $\Lambda\in (0,1]$, where we trivially accept (rather fail to reject) $H_0$ whenever $\Lambda=1$.
This justifies a left-tailed test based on $\Lambda$, and $\Lambda(\boldsymbol X)$ is called the likelihood ratio criterion.