Is my understanding of Likelihood Ratio Tests and Likelihood Functions correct?

710 Views Asked by At

I'm trying to understand Likelihood Ratio Tests, Likelihood Functions and Hypotheses Tests, which makes up a significant amount of what we're supposed to learn in my statistics for beginners course. But I'm very new to this level of maths, so a lot of the rigorous knowledge and intuitive understanding is unfortunately missing for me; my apologies then for the length of the post from my over explaining some things.

I was wondering if you could tell me from my summation if my understanding of these processes is correct/where I've gone wrong?

My understanding so far:

Imagine you have collected a sample of data from some population; say you know the type of distribution that models the population, but you are unsure of the value of one of the parameters. So you construct a Likelihood Function, and work out the value of your parameter $\theta$ that maximises the probability of your sample data under that distribution. And you do this by taking the (Log) Likelihood Function, differentiating and setting equal to zero, then solving for $\theta$. I.e. if you were to graph 'Probability of your sample occurring' on the Y-axis vs the potential values of your parameter $\theta$ on X-axis, you would be finding the $\theta$ that maximises this function on your graph.

So now comes Likelihood Ratio Tests. As I understand it so far, the purpose of an LRT is to workout whether the parameter you've identified as maximising is better than some null-hypothesis parameter, by a statistically significant level. Otherwise without doing an LRT, it might be that your proposed maximising $\theta$ was only marginally better, and maybe only for this sample, than the null value due to something like sampling error, yet you thought it was definitively better. Now if I understand Null Hypotheses Tests right, in the case where your null hyp is rejected, the test doesn't then specifically tell you which other hyp is the actual true one, a hypothesis test just tells you whether or not the null-hyp is true or not (if it's not true though, you know the true hypothesis will be a member of the null's complement).

Then, let:

$H_0 = {}$'$\theta_\text{Null-Hyp}$ is the value of the parameter that maximises your Likelihood Function.'

$H_1 = {}$'$\theta_\text{Null-Hyp}$ is not the maximising value (and thus the max value $\theta \in \Theta_\text{Alt-Hyp}$, where $\Theta_\text{Alt-Hyp} = \theta_\text{Null-Hyp}^c)$.'

$L(\theta) = {}$Likelihood Function with $\theta$ as your maximising parameter.

The Likelihood Ratio is then:

$$\frac{L(\theta_\text{Null-Hyp})}{L(\Theta_\text{Alt-Hyp})} \leqslant K$$

So, constructing your LRT: when setting up your LRT you first pick the confidence lvl (say 95%) you want. Then you work out the K that corresponds to this confidence lvl (more on this later). Now for every $\theta$ $\in$ $\Theta_\text{Alt-Hyp}$, s.t. the ratio is $\leqslant K$, you can say with 95% confidence that these $\theta$ output, by a statistically significant amount, a higher value for the Likelihood Function than $\theta_\text{Null-Hyp}$. So you reject the null hypothesis. And of course, you already knew the $\theta \in \Theta_\text{Alt-Hyp}$ that maximises the Likelihood Function, but now you're very confident this value wasn't produced from your sample erroneously, e.g. from things like sampling error, etc.

That's my summation of the process, which I thought was approximately right - though please let me know where I've gone wrong.

However, it's in the the process of the below example that I am most confused.


In this example about honey pots,

https://onlinecourses.science.psu.edu/stat414/node/309

the author is trying to find the mean weight of the population from a sample of pots. They construct an LRT with an (as far as I understand it) arbitrarily chosen value for the $H_0$ pop mean of $\theta_\text{Null-Hyp} = 10,$ and Sig Lvl of $\alpha = 0.05.$ Their variable for $\theta_\text{Alt-Hyp}$ is $\bar X$.

The author takes the Likelihood Ratio, manipulates the algebra, and arrives at the equation:

$$Z= \frac{\bar X-10}{\sigma / \sqrt n} = f(K) $$

From this they look up the Z statistic for $\alpha = 0.05$ to say that $f(K) = 1.96.$ They then find the inverse of $f(K)$ to find their value of $K.$

Here's why I'm confused. If in the end they use the Z-Statistic to calculate their value of K, why don't they just skip straight to calculating the Z-stat in the first place, instead of doing the LRT? If they work to make sure they can input $\bar X$ into the Z-stat to be able to decide whether to reject $H_0$, why do they then continue with the LRT, or use it in the first place? On top of this, what determines what value you should choose for the null hypothesis?

Clearly I'm missing something in my understanding (as well as any holes in my explanations above), so if anyone could fill in my knowledge gaps I'd really appreciate it.

I apologise that this is such a long post, thanks for your time so far. Any help you could offer would be greatly appreciated.

Cheers

3

There are 3 best solutions below

0
On

A compound hypothesis is a hypothesis that is consistent with more than one probability distribution. When the alternative hypothesis is composite, then one uses $$ \frac{L(\theta_\text{null})}{L(\theta_{\text{null} \,\cup\, \text{alternative}})} $$ as the test statistic. Then the denominator may be equal to the numerator, in which case the value of this ratio is $1.$ The critical value $K$ will be less than $1.$

They prove the test using the $Z$ statistic is the same as the likelihood ratio test so that they can cite theorems about likelihood ratio tests being in some way optimal under some circumstances.

4
On

First let's adress your confusion with the example.

1. If in the end they use the Z statistic to calculate their value of $k$, why don't they just skip straight to calculating the Z statistic in the first place, instead of doing the LRT?

Notice that the $Z$ statistic is the tool they utilize to determine $k$. The example is built to show you how do we find that the $Z$ statistic are useful for doing the LRT when infering about the mean of a normal distribution using a simple null hypothesis versus a composite alternative hypothesis. So, in the end they use the $Z$ statistic because through the process they find out that they can use that it determine the value of $k$. Without going through these steps at least one time we would never find out that we can use a $Z$ statistic to determine $k$.

2. If they work to make sure they can input $\bar X$ into the Z statistic to be able to decide whether to reject $H_0$, why do they then continue with the LRT, or use it in the first place?

They work not to make sure, but to find out that the Z statistic is usable to develop the LRT. Observe that in this case the LRT dependes on the Z statistic. Without using the Z statistic we would not be able to determine $k$.

3. On top of this, what determines what value you should choose for the null hypothesis?

The $H_0$ value is not arbitrarily chosen. Actually is a educated guess based on the information we already have. Read carefully the examples statement:

A food processing company packages honey in small glass jars. Each jar is supposed to contain 10 fluid ounces of the sweet and gooey good stuff. Previous experience suggests that the volume X, the volume in fluid ounces of a randomly selected jar of the company's honey is normally distributed with a known variance of 2.

Each jar is supposed to contain 10 fluid ounces of the sweet and gooey good stuff. This means that on average a jar will contain 10 fluid ounces. So whatever distrubution the liquid pouring machine follows, we want it to have mean equal to $10$. Because we want to fill each jar with 10 fluid ounces. Furthermore we know that $X = N(\mu, 2)$. Hence, since we want to make sure that $\mu = 10$, it is logical to chose $H_0: \mu_0 = 10$. We want to test with $\alpha$ significance if indeed $\mu = 10$.

Now some comments on your understand of LRT.

And you do this by taking the (Log) Likelihood Function, differentiating and setting equal to zero, then solving for $\theta$.

It is not always possible to do this. Sometimes we need to find such $\theta$ by inspecting or numerically.

Otherwise without doing an LRT, it might be that your proposed maximising $\theta$ was only marginally better, and maybe only for this sample, than the null value due to something like sampling error, yet you thought it was definitively better.

It is not exactly about maximising $L(\theta)$. You should think more about this. The purpose of a statistical hypothesis test (such as LRT) is to statiscally infer if the null hypothesis is true or false with probability $1-\alpha$, or in other words, $\alpha$ significance. Such tests are used when we want to obtain new information about the parameters being tested with a confidence of $< 100\%$. We could have used others test that do not make use of the likelihood function.

Yet you are on the right track.

(Also if you want to to maximise the probability of your questions getting answered you should focus on one at time :p)

0
On

I think neither of the other two answers succinctly answers your question "why $Z$?"

The point is that $Z$ only appears in this example because of the distributional assumption made in the problem statement. If you make different distributional assumtions, the likelihood ratio inequality may not yield the $Z$ statistic.

Likelihood tests are a general framework for hypothesis testing. In certain cases they are equivalent to doing the $Z$ tests any good high school student learns. But in other cases they aren't. The point of the example you link to is to show that using the likelihood test framework recovers a standard sort of hypothesis test you are assumed to already be familiar with.