I'm trying to understand Likelihood Ratio Tests, Likelihood Functions and Hypotheses Tests, which makes up a significant amount of what we're supposed to learn in my statistics for beginners course. But I'm very new to this level of maths, so a lot of the rigorous knowledge and intuitive understanding is unfortunately missing for me; my apologies then for the length of the post from my over explaining some things.
I was wondering if you could tell me from my summation if my understanding of these processes is correct/where I've gone wrong?
My understanding so far:
Imagine you have collected a sample of data from some population; say you know the type of distribution that models the population, but you are unsure of the value of one of the parameters. So you construct a Likelihood Function, and work out the value of your parameter $\theta$ that maximises the probability of your sample data under that distribution. And you do this by taking the (Log) Likelihood Function, differentiating and setting equal to zero, then solving for $\theta$. I.e. if you were to graph 'Probability of your sample occurring' on the Y-axis vs the potential values of your parameter $\theta$ on X-axis, you would be finding the $\theta$ that maximises this function on your graph.
So now comes Likelihood Ratio Tests. As I understand it so far, the purpose of an LRT is to workout whether the parameter you've identified as maximising is better than some null-hypothesis parameter, by a statistically significant level. Otherwise without doing an LRT, it might be that your proposed maximising $\theta$ was only marginally better, and maybe only for this sample, than the null value due to something like sampling error, yet you thought it was definitively better. Now if I understand Null Hypotheses Tests right, in the case where your null hyp is rejected, the test doesn't then specifically tell you which other hyp is the actual true one, a hypothesis test just tells you whether or not the null-hyp is true or not (if it's not true though, you know the true hypothesis will be a member of the null's complement).
Then, let:
$H_0 = {}$'$\theta_\text{Null-Hyp}$ is the value of the parameter that maximises your Likelihood Function.'
$H_1 = {}$'$\theta_\text{Null-Hyp}$ is not the maximising value (and thus the max value $\theta \in \Theta_\text{Alt-Hyp}$, where $\Theta_\text{Alt-Hyp} = \theta_\text{Null-Hyp}^c)$.'
$L(\theta) = {}$Likelihood Function with $\theta$ as your maximising parameter.
The Likelihood Ratio is then:
$$\frac{L(\theta_\text{Null-Hyp})}{L(\Theta_\text{Alt-Hyp})} \leqslant K$$
So, constructing your LRT: when setting up your LRT you first pick the confidence lvl (say 95%) you want. Then you work out the K that corresponds to this confidence lvl (more on this later). Now for every $\theta$ $\in$ $\Theta_\text{Alt-Hyp}$, s.t. the ratio is $\leqslant K$, you can say with 95% confidence that these $\theta$ output, by a statistically significant amount, a higher value for the Likelihood Function than $\theta_\text{Null-Hyp}$. So you reject the null hypothesis. And of course, you already knew the $\theta \in \Theta_\text{Alt-Hyp}$ that maximises the Likelihood Function, but now you're very confident this value wasn't produced from your sample erroneously, e.g. from things like sampling error, etc.
That's my summation of the process, which I thought was approximately right - though please let me know where I've gone wrong.
However, it's in the the process of the below example that I am most confused.
In this example about honey pots,
https://onlinecourses.science.psu.edu/stat414/node/309
the author is trying to find the mean weight of the population from a sample of pots. They construct an LRT with an (as far as I understand it) arbitrarily chosen value for the $H_0$ pop mean of $\theta_\text{Null-Hyp} = 10,$ and Sig Lvl of $\alpha = 0.05.$ Their variable for $\theta_\text{Alt-Hyp}$ is $\bar X$.
The author takes the Likelihood Ratio, manipulates the algebra, and arrives at the equation:
$$Z= \frac{\bar X-10}{\sigma / \sqrt n} = f(K) $$
From this they look up the Z statistic for $\alpha = 0.05$ to say that $f(K) = 1.96.$ They then find the inverse of $f(K)$ to find their value of $K.$
Here's why I'm confused. If in the end they use the Z-Statistic to calculate their value of K, why don't they just skip straight to calculating the Z-stat in the first place, instead of doing the LRT? If they work to make sure they can input $\bar X$ into the Z-stat to be able to decide whether to reject $H_0$, why do they then continue with the LRT, or use it in the first place? On top of this, what determines what value you should choose for the null hypothesis?
Clearly I'm missing something in my understanding (as well as any holes in my explanations above), so if anyone could fill in my knowledge gaps I'd really appreciate it.
I apologise that this is such a long post, thanks for your time so far. Any help you could offer would be greatly appreciated.
Cheers
A compound hypothesis is a hypothesis that is consistent with more than one probability distribution. When the alternative hypothesis is composite, then one uses $$ \frac{L(\theta_\text{null})}{L(\theta_{\text{null} \,\cup\, \text{alternative}})} $$ as the test statistic. Then the denominator may be equal to the numerator, in which case the value of this ratio is $1.$ The critical value $K$ will be less than $1.$
They prove the test using the $Z$ statistic is the same as the likelihood ratio test so that they can cite theorems about likelihood ratio tests being in some way optimal under some circumstances.